>On Tue, 24 Mar 2026 15:06:16 +0800 (CST)
><[email protected]> wrote:
>
>> From: luohaiyang10243395 <[email protected]>
>> 
>> The following sequence may leads deadlock in cpu hotplug:
>> 
>>   CPU0                        |  CPU1
>>                               |  schedule_work_on
>>                               |
>>   _cpu_down//set CPU1 offline |
>>   cpus_write_lock             |
>>                               |  osnoise_hotplug_workfn
>>                               |    mutex_lock(&interface_lock);
>>                               |    cpus_read_lock();  //wait cpu_hotplug_lock
>>                               |
>>                               |  cpuhp/1
>>                               |    osnoise_cpu_die
>>                               |      kthread_stop
>>                               |        wait_for_completion //wait osnoise/1 
>> exit
>>                               |
>>                               |  osnoise/1
>>                               |    osnoise_sleep
>>                               |      mutex_lock(&interface_lock); //deadlock
>> 
>> Fix by swap the order of cpus_read_lock() and mutex_lock(&interface_lock).
>
>So the deadlock is due to the "wait_for_completion"?

The osnoise_cpu_init callback returns directly, which may allow another CPU 
offline task to run, 
the offline task holds the cpu_hotplug_lock while waiting for the osnoise task 
to exit. 
osnoise_hotplug_workfn may acquire interface_lock first, causing the offline 
task to be blocked. 
This is an ABBA deadlock.

>How did you find this bug? Inspection, AI, triggered?
>
>Thanks,
>
>-- Steve

We run autotests on kernel-6.6, report following hung task warning, and we 
think the same issue exists
in linux-stable.
 [39401.476843] INFO: task cpuhp/7:47 blocked for more than 120 seconds.
 [39401.483196]       Tainted: G            E      
6.6.102-5.2.1.an23.103.aarch64 #1
 [39401.490581] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
 [39401.498398] task:cpuhp/7         state:D stack:0     pid:47    ppid:2      
flags:0x00000208
 [39401.506739] Call trace:
 [39401.509175]  __switch_to+0x138/0x180
 [39401.512743]  __schedule+0x250/0x5e8
 [39401.516220]  schedule+0x60/0x100
 [39401.519437]  schedule_timeout+0x1a0/0x1c0
 [39401.523437]  wait_for_completion+0xbc/0x190
 [39401.527609]  kthread_stop+0x7c/0x268
 [39401.531175]  stop_kthread+0x8c/0x178
 [39401.534740]  osnoise_cpu_die+0xc/0x18
 [39401.538391]  cpuhp_invoke_callback+0x148/0x580
 [39401.542822]  cpuhp_thread_fun+0xc8/0x1a0
 [39401.546733]  smpboot_thread_fn+0x224/0x250
 [39401.550817]  kthread+0xf8/0x110
 [39401.553947]  ret_from_fork+0x10/0x20
 [39401.557545] INFO: task sh:28856 blocked for more than 120 seconds.
 [39401.563713]       Tainted: G            E      
6.6.102-5.2.1.an23.103.aarch64 #1
 [39401.571095] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
 [39401.578912] task:sh              state:D stack:0     pid:28856 ppid:1      
flags:0x00800004
 [39401.587251] Call trace:
 [39401.589685]  __switch_to+0x138/0x180
 [39401.593250]  __schedule+0x250/0x5e8
 [39401.596725]  schedule+0x60/0x100
 [39401.599941]  schedule_timeout+0x1a0/0x1c0
 [39401.603940]  wait_for_completion+0xbc/0x190
 [39401.608113]  __flush_work+0x5c/0xa8
 [39401.611590]  work_on_cpu_key+0x88/0xc0
 [39401.615331]  cpu_down_maps_locked+0xd0/0xe8
 [39401.619503]  cpu_device_down+0x38/0x60
 [39401.623240]  cpu_subsys_offline+0x14/0x28
 [39401.627238]  device_offline+0xb8/0x130
 [39401.630976]  online_store+0x64/0xe0
 [39401.634453]  dev_attr_store+0x1c/0x38
 [39401.638104]  sysfs_kf_write+0x48/0x60
 [39401.641756]  kernfs_fop_write_iter+0x118/0x1e8
 [39401.646188]  vfs_write+0x1a4/0x2f8
 [39401.649580]  ksys_write+0x70/0x108
 [39401.652970]  __arm64_sys_write+0x20/0x30
 [39401.656880]  el0_svc_common.constprop.0+0x60/0x138
 [39401.661660]  do_el0_svc+0x20/0x30
 [39401.664964]  el0_svc+0x44/0x1f8
 [39401.668093]  el0t_64_sync_handler+0xf8/0x128
 [39401.672352]  el0t_64_sync+0x17c/0x180
 [39401.875086] INFO: task kworker/7:2:2314252 blocked for more than 121 
seconds.
 [39401.882208]       Tainted: G            E      
6.6.102-5.2.1.an23.103.aarch64 #1
 [39401.889590] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
 [39401.897406] task:kworker/7:2     state:D stack:0     pid:2314252 ppid:2     
 flags:0x00000008
 [39401.905917] Workqueue: events osnoise_hotplug_workfn
 [39401.910871] Call trace:
 [39401.913306]  __switch_to+0x138/0x180
 [39401.916870]  __schedule+0x250/0x5e8
 [39401.920345]  schedule+0x60/0x100
 [39401.923561]  percpu_rwsem_wait+0xfc/0x128
 [39401.927559]  __percpu_down_read+0x60/0x198
 [39401.931644]  percpu_down_read.constprop.0+0xac/0xb8
 [39401.936510]  cpus_read_lock+0x14/0x20
 [39401.940160]  osnoise_hotplug_workfn+0x54/0xb0
 [39401.944506]  process_one_work+0x184/0x420
 [39401.948503]  worker_thread+0x2b4/0x3d8
 [39401.952241]  kthread+0xf8/0x110
 [39401.955370]  ret_from_fork+0x10/0x20
 [39402.125508] INFO: task osnoise/0:2356235 blocked for more than 121 seconds.
 [39402.132458]       Tainted: G            E      
6.6.102-5.2.1.an23.103.aarch64 #1
 [39402.139840] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
 [39402.147656] task:osnoise/0       state:D stack:0     pid:2356235 ppid:2     
 flags:0x00000008
 [39402.156168] Call trace:
 [39402.158602]  __switch_to+0x138/0x180
 [39402.162166]  __schedule+0x250/0x5e8
 [39402.165643]  schedule+0x60/0x100
 [39402.168860]  schedule_preempt_disabled+0x28/0x48
 [39402.173466]  __mutex_lock.constprop.0+0x324/0x5f8
 [39402.178158]  __mutex_lock_slowpath+0x18/0x28
 [39402.182416]  mutex_lock+0x64/0x78
 [39402.185720]  osnoise_sleep+0x30/0x130
 [39402.189371]  osnoise_main+0x164/0x190
 [39402.193021]  kthread+0xf8/0x110
 [39402.196149]  ret_from_fork+0x10/0x20
 [39402.199713] INFO: task osnoise/1:2356236 blocked for more than 121 seconds.
 [39402.206661]       Tainted: G            E      
6.6.102-5.2.1.an23.103.aarch64 #1
 [39402.214044] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
 [39402.221860] task:osnoise/1       state:D stack:0     pid:2356236 ppid:2     
 flags:0x00000008
 [39402.230372] Call trace:
 [39402.232804]  __switch_to+0x138/0x180
 [39402.236368]  __schedule+0x250/0x5e8
 [39402.239845]  schedule+0x60/0x100
 [39402.243061]  schedule_preempt_disabled+0x28/0x48
 [39402.247666]  __mutex_lock.constprop.0+0x324/0x5f8
 [39402.252359]  __mutex_lock_slowpath+0x18/0x28
 [39402.256618]  mutex_lock+0x64/0x78
 [39402.259921]  osnoise_sleep+0x30/0x130
 [39402.263572]  osnoise_main+0x164/0x190
 [39402.267223]  kthread+0xf8/0x110
 [39402.270352]  ret_from_fork+0x10/0x20
 [39402.273916] INFO: task osnoise/2:2356237 blocked for more than 121 seconds.
 [39402.280865]       Tainted: G            E      
6.6.102-5.2.1.an23.103.aarch64 #1
 [39402.288247] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
 [39402.296064] task:osnoise/2       state:D stack:0     pid:2356237 ppid:2     
 flags:0x00000008
 [39402.304575] Call trace:
 [39402.307010]  __switch_to+0x138/0x180
 [39402.310574]  __schedule+0x250/0x5e8
 [39402.314051]  schedule+0x60/0x100
 [39402.317268]  schedule_preempt_disabled+0x28/0x48
 [39402.321873]  __mutex_lock.constprop.0+0x324/0x5f8
 [39402.326566]  __mutex_lock_slowpath+0x18/0x28
 [39402.330824]  mutex_lock+0x64/0x78
 [39402.334128]  osnoise_sleep+0x30/0x130
 [39402.337778]  osnoise_main+0x164/0x190
 [39402.341429]  kthread+0xf8/0x110
 [39402.344556]  ret_from_fork+0x10/0x20
 [39402.348120] Future hung task reports are suppressed, see sysctl 
kernel.hung_task_warnings
 [39402.356295] Kernel panic - not syncing: hung_task: blocked tasks 

Thanks,
Haiyang

>> 
>> Signed-off-by: Luo Haiyang <[email protected]>
>> ---
>>  kernel/trace/trace_osnoise.c | 10 +++++-----
>>  1 file changed, 5 insertions(+), 5 deletions(-)
>> 
>> diff --git a/kernel/trace/trace_osnoise.c b/kernel/trace/trace_osnoise.c
>> index dee610e465b9..be6cf0bb3c03 100644
>> --- a/kernel/trace/trace_osnoise.c
>> +++ b/kernel/trace/trace_osnoise.c
>> @@ -2073,8 +2073,8 @@ static void osnoise_hotplug_workfn(struct work_struct 
>> *dummy)
>>      if (!osnoise_has_registered_instances())
>>          return;
>> 
>> -    guard(mutex)(&interface_lock);
>>      guard(cpus_read_lock)();
>> +    guard(mutex)(&interface_lock);
>> 
>>      if (!cpu_online(cpu))
>>          return;
>> @@ -2237,11 +2237,11 @@ static ssize_t osnoise_options_write(struct file 
>> *filp, const char __user *ubuf,
>>      if (running)
>>          stop_per_cpu_kthreads();
>> 
>> -    mutex_lock(&interface_lock);
>>      /*
>>       * avoid CPU hotplug operations that might read options.
>>       */
>>      cpus_read_lock();
>> +    mutex_lock(&interface_lock);
>> 
>>      retval = cnt;
>> 
>> @@ -2257,8 +2257,8 @@ static ssize_t osnoise_options_write(struct file 
>> *filp, const char __user *ubuf,
>>              clear_bit(option, &osnoise_options);
>>      }
>> 
>> -    cpus_read_unlock();
>>      mutex_unlock(&interface_lock);
>> +    cpus_read_unlock();
>> 
>>      if (running)
>>          start_per_cpu_kthreads();
>> @@ -2345,16 +2345,16 @@ osnoise_cpus_write(struct file *filp, const char 
>> __user *ubuf, size_t count,
>>      if (running)
>>          stop_per_cpu_kthreads();
>> 
>> -    mutex_lock(&interface_lock);
>>      /*
>>       * osnoise_cpumask is read by CPU hotplug operations.
>>       */
>>      cpus_read_lock();
>> +    mutex_lock(&interface_lock);
>> 
>>      cpumask_copy(&osnoise_cpumask, osnoise_cpumask_new);
>> 
>> -    cpus_read_unlock();
>>      mutex_unlock(&interface_lock);
>> +    cpus_read_unlock();
>> 
>>      if (running)
>>          start_per_cpu_kthreads();

Reply via email to