>On Tue, 24 Mar 2026 15:06:16 +0800 (CST) ><[email protected]> wrote: > >> From: luohaiyang10243395 <[email protected]> >> >> The following sequence may leads deadlock in cpu hotplug: >> >> CPU0 | CPU1 >> | schedule_work_on >> | >> _cpu_down//set CPU1 offline | >> cpus_write_lock | >> | osnoise_hotplug_workfn >> | mutex_lock(&interface_lock); >> | cpus_read_lock(); //wait cpu_hotplug_lock >> | >> | cpuhp/1 >> | osnoise_cpu_die >> | kthread_stop >> | wait_for_completion //wait osnoise/1 >> exit >> | >> | osnoise/1 >> | osnoise_sleep >> | mutex_lock(&interface_lock); //deadlock >> >> Fix by swap the order of cpus_read_lock() and mutex_lock(&interface_lock). > >So the deadlock is due to the "wait_for_completion"?
The osnoise_cpu_init callback returns directly, which may allow another CPU offline task to run, the offline task holds the cpu_hotplug_lock while waiting for the osnoise task to exit. osnoise_hotplug_workfn may acquire interface_lock first, causing the offline task to be blocked. This is an ABBA deadlock. >How did you find this bug? Inspection, AI, triggered? > >Thanks, > >-- Steve We run autotests on kernel-6.6, report following hung task warning, and we think the same issue exists in linux-stable. [39401.476843] INFO: task cpuhp/7:47 blocked for more than 120 seconds. [39401.483196] Tainted: G E 6.6.102-5.2.1.an23.103.aarch64 #1 [39401.490581] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [39401.498398] task:cpuhp/7 state:D stack:0 pid:47 ppid:2 flags:0x00000208 [39401.506739] Call trace: [39401.509175] __switch_to+0x138/0x180 [39401.512743] __schedule+0x250/0x5e8 [39401.516220] schedule+0x60/0x100 [39401.519437] schedule_timeout+0x1a0/0x1c0 [39401.523437] wait_for_completion+0xbc/0x190 [39401.527609] kthread_stop+0x7c/0x268 [39401.531175] stop_kthread+0x8c/0x178 [39401.534740] osnoise_cpu_die+0xc/0x18 [39401.538391] cpuhp_invoke_callback+0x148/0x580 [39401.542822] cpuhp_thread_fun+0xc8/0x1a0 [39401.546733] smpboot_thread_fn+0x224/0x250 [39401.550817] kthread+0xf8/0x110 [39401.553947] ret_from_fork+0x10/0x20 [39401.557545] INFO: task sh:28856 blocked for more than 120 seconds. [39401.563713] Tainted: G E 6.6.102-5.2.1.an23.103.aarch64 #1 [39401.571095] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [39401.578912] task:sh state:D stack:0 pid:28856 ppid:1 flags:0x00800004 [39401.587251] Call trace: [39401.589685] __switch_to+0x138/0x180 [39401.593250] __schedule+0x250/0x5e8 [39401.596725] schedule+0x60/0x100 [39401.599941] schedule_timeout+0x1a0/0x1c0 [39401.603940] wait_for_completion+0xbc/0x190 [39401.608113] __flush_work+0x5c/0xa8 [39401.611590] work_on_cpu_key+0x88/0xc0 [39401.615331] cpu_down_maps_locked+0xd0/0xe8 [39401.619503] cpu_device_down+0x38/0x60 [39401.623240] cpu_subsys_offline+0x14/0x28 [39401.627238] device_offline+0xb8/0x130 [39401.630976] online_store+0x64/0xe0 [39401.634453] dev_attr_store+0x1c/0x38 [39401.638104] sysfs_kf_write+0x48/0x60 [39401.641756] kernfs_fop_write_iter+0x118/0x1e8 [39401.646188] vfs_write+0x1a4/0x2f8 [39401.649580] ksys_write+0x70/0x108 [39401.652970] __arm64_sys_write+0x20/0x30 [39401.656880] el0_svc_common.constprop.0+0x60/0x138 [39401.661660] do_el0_svc+0x20/0x30 [39401.664964] el0_svc+0x44/0x1f8 [39401.668093] el0t_64_sync_handler+0xf8/0x128 [39401.672352] el0t_64_sync+0x17c/0x180 [39401.875086] INFO: task kworker/7:2:2314252 blocked for more than 121 seconds. [39401.882208] Tainted: G E 6.6.102-5.2.1.an23.103.aarch64 #1 [39401.889590] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [39401.897406] task:kworker/7:2 state:D stack:0 pid:2314252 ppid:2 flags:0x00000008 [39401.905917] Workqueue: events osnoise_hotplug_workfn [39401.910871] Call trace: [39401.913306] __switch_to+0x138/0x180 [39401.916870] __schedule+0x250/0x5e8 [39401.920345] schedule+0x60/0x100 [39401.923561] percpu_rwsem_wait+0xfc/0x128 [39401.927559] __percpu_down_read+0x60/0x198 [39401.931644] percpu_down_read.constprop.0+0xac/0xb8 [39401.936510] cpus_read_lock+0x14/0x20 [39401.940160] osnoise_hotplug_workfn+0x54/0xb0 [39401.944506] process_one_work+0x184/0x420 [39401.948503] worker_thread+0x2b4/0x3d8 [39401.952241] kthread+0xf8/0x110 [39401.955370] ret_from_fork+0x10/0x20 [39402.125508] INFO: task osnoise/0:2356235 blocked for more than 121 seconds. [39402.132458] Tainted: G E 6.6.102-5.2.1.an23.103.aarch64 #1 [39402.139840] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [39402.147656] task:osnoise/0 state:D stack:0 pid:2356235 ppid:2 flags:0x00000008 [39402.156168] Call trace: [39402.158602] __switch_to+0x138/0x180 [39402.162166] __schedule+0x250/0x5e8 [39402.165643] schedule+0x60/0x100 [39402.168860] schedule_preempt_disabled+0x28/0x48 [39402.173466] __mutex_lock.constprop.0+0x324/0x5f8 [39402.178158] __mutex_lock_slowpath+0x18/0x28 [39402.182416] mutex_lock+0x64/0x78 [39402.185720] osnoise_sleep+0x30/0x130 [39402.189371] osnoise_main+0x164/0x190 [39402.193021] kthread+0xf8/0x110 [39402.196149] ret_from_fork+0x10/0x20 [39402.199713] INFO: task osnoise/1:2356236 blocked for more than 121 seconds. [39402.206661] Tainted: G E 6.6.102-5.2.1.an23.103.aarch64 #1 [39402.214044] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [39402.221860] task:osnoise/1 state:D stack:0 pid:2356236 ppid:2 flags:0x00000008 [39402.230372] Call trace: [39402.232804] __switch_to+0x138/0x180 [39402.236368] __schedule+0x250/0x5e8 [39402.239845] schedule+0x60/0x100 [39402.243061] schedule_preempt_disabled+0x28/0x48 [39402.247666] __mutex_lock.constprop.0+0x324/0x5f8 [39402.252359] __mutex_lock_slowpath+0x18/0x28 [39402.256618] mutex_lock+0x64/0x78 [39402.259921] osnoise_sleep+0x30/0x130 [39402.263572] osnoise_main+0x164/0x190 [39402.267223] kthread+0xf8/0x110 [39402.270352] ret_from_fork+0x10/0x20 [39402.273916] INFO: task osnoise/2:2356237 blocked for more than 121 seconds. [39402.280865] Tainted: G E 6.6.102-5.2.1.an23.103.aarch64 #1 [39402.288247] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [39402.296064] task:osnoise/2 state:D stack:0 pid:2356237 ppid:2 flags:0x00000008 [39402.304575] Call trace: [39402.307010] __switch_to+0x138/0x180 [39402.310574] __schedule+0x250/0x5e8 [39402.314051] schedule+0x60/0x100 [39402.317268] schedule_preempt_disabled+0x28/0x48 [39402.321873] __mutex_lock.constprop.0+0x324/0x5f8 [39402.326566] __mutex_lock_slowpath+0x18/0x28 [39402.330824] mutex_lock+0x64/0x78 [39402.334128] osnoise_sleep+0x30/0x130 [39402.337778] osnoise_main+0x164/0x190 [39402.341429] kthread+0xf8/0x110 [39402.344556] ret_from_fork+0x10/0x20 [39402.348120] Future hung task reports are suppressed, see sysctl kernel.hung_task_warnings [39402.356295] Kernel panic - not syncing: hung_task: blocked tasks Thanks, Haiyang >> >> Signed-off-by: Luo Haiyang <[email protected]> >> --- >> kernel/trace/trace_osnoise.c | 10 +++++----- >> 1 file changed, 5 insertions(+), 5 deletions(-) >> >> diff --git a/kernel/trace/trace_osnoise.c b/kernel/trace/trace_osnoise.c >> index dee610e465b9..be6cf0bb3c03 100644 >> --- a/kernel/trace/trace_osnoise.c >> +++ b/kernel/trace/trace_osnoise.c >> @@ -2073,8 +2073,8 @@ static void osnoise_hotplug_workfn(struct work_struct >> *dummy) >> if (!osnoise_has_registered_instances()) >> return; >> >> - guard(mutex)(&interface_lock); >> guard(cpus_read_lock)(); >> + guard(mutex)(&interface_lock); >> >> if (!cpu_online(cpu)) >> return; >> @@ -2237,11 +2237,11 @@ static ssize_t osnoise_options_write(struct file >> *filp, const char __user *ubuf, >> if (running) >> stop_per_cpu_kthreads(); >> >> - mutex_lock(&interface_lock); >> /* >> * avoid CPU hotplug operations that might read options. >> */ >> cpus_read_lock(); >> + mutex_lock(&interface_lock); >> >> retval = cnt; >> >> @@ -2257,8 +2257,8 @@ static ssize_t osnoise_options_write(struct file >> *filp, const char __user *ubuf, >> clear_bit(option, &osnoise_options); >> } >> >> - cpus_read_unlock(); >> mutex_unlock(&interface_lock); >> + cpus_read_unlock(); >> >> if (running) >> start_per_cpu_kthreads(); >> @@ -2345,16 +2345,16 @@ osnoise_cpus_write(struct file *filp, const char >> __user *ubuf, size_t count, >> if (running) >> stop_per_cpu_kthreads(); >> >> - mutex_lock(&interface_lock); >> /* >> * osnoise_cpumask is read by CPU hotplug operations. >> */ >> cpus_read_lock(); >> + mutex_lock(&interface_lock); >> >> cpumask_copy(&osnoise_cpumask, osnoise_cpumask_new); >> >> - cpus_read_unlock(); >> mutex_unlock(&interface_lock); >> + cpus_read_unlock(); >> >> if (running) >> start_per_cpu_kthreads();
