On 2025-07-21 14:06:48 [+0900], Romain Guyard wrote: > Hello, Hi, > [ 2349.629427] Hardware name: ADLINK TECHNOLOGY Inc. -612X/-612X, BIOS > [ 2349.629454] </TASK> > [ 2412.634282] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: > [ 2412.634284] rcu: Tasks blocked on level-0 rcu_node (CPUs 0-15): > P12083/1:b..l P12724/1:b..l P12725/1:b..l P4057/3:b..l > [ 2412.634289] rcu: (detected by 14, t=147008 jiffies, g=355917, q=9582 > ncpus=16) > [ 2412.634290] task:Xorg state:D stack:0 pid:4057 tgid:4057 > ppid:4055 task_flags:0x400100 flags:0x00004000 > [ 2412.634292] Call Trace: > [ 2412.634293] <TASK> > [ 2412.634295] __schedule+0x44c/0xad0 > [ 2412.634302] schedule_rtlock+0x25/0x40 > [ 2412.634303] rtlock_slowlock_locked+0x20d/0xe00 > [ 2412.634307] rt_spin_lock+0x7a/0xd0 > [ 2412.634309] execlists_submission_tasklet+0x143/0x14d0 > [ 2412.634354] tasklet_action_common+0xc1/0x230 > [ 2412.634356] handle_softirqs.constprop.0+0xce/0x280 > [ 2412.634358] __local_bh_enable_ip+0xa0/0xd0 > [ 2412.634359] i915_gem_do_execbuffer+0x1a73/0x2920
This blocks on a lock and waits to make progress. I did not find out who is holding that one but. … > [ 2412.634511] </TASK> > [ 2412.634511] task:kworker/14:1 state:R running task stack:0 > pid:12083 tgid:12083 ppid:2 task_flags:0x4208060 flags:0x00004000 > [ 2412.634513] Workqueue: i915-unordered engine_retire > [ 2412.634515] Call Trace: > [ 2412.634516] <TASK> > [ 2412.634516] __schedule+0x44c/0xad0 > [ 2412.634520] preempt_schedule_common+0x31/0x80 > [ 2412.634521] preempt_schedule_thunk+0x16/0x30 > [ 2412.634523] migrate_enable+0xe6/0x100 > [ 2412.634525] rt_spin_unlock+0x12/0x40 > [ 2412.634526] remove_from_engine+0x76/0xc0 > [ 2412.634528] i915_request_retire.part.0+0x7c/0x220 > [ 2412.634530] engine_retire+0xc3/0x100 > [ 2412.634531] process_one_work+0x166/0x390 > [ 2412.634533] worker_thread+0x29d/0x3c0 this might be the one. The task is running state so I don't understand what is holding the scheduler back to put it back on the CPU. There is at least one CPU idle available but this workqueue is called i915-unordered but must complete on the same CPU (it can't migrate). So what is CPU14 doing? It should schedule something and not be idle. > Looks like there are some i915 locking stuff in those BTs. > > I am not very knowledgeable about i915 and RT, so my help is quite limited, > but since this is easily reproduced (always crash or hangs after <1H), I can > try things. I don't know what you can retrieve from the kdump but CPU14 should be spinning on something I guess. RCU complains about not making progress. If RCU-boost is enabled then the kworker should have one more reason to be on the CPU. Could you try v6.17-rc? I didn't add anything i915 related. Could lease please enable CONFIG_PROVE_LOCKING, CONFIG_DEBUG_ATOMIC_SLEEP and check if the kernel complains? Maybe there is something new I haven't noticed. > Thank you! > > Romain Guyard Sebastian