>On Tue, Feb 25, 2025 at 12:31:32PM +0000, Ran Xiaokai wrote:
>> From: Ran Xiaokai <[email protected]>
>> 
>> Lockdep reports this deadlock log:
>> ============================================
>> WARNING: possible recursive locking detected
>> --------------------------------------------
>> sh/31444 is trying to acquire lock:
>> ffffffff82c51af0 (cpu_hotplug_lock){++++}-{0:0}, at:
>> stop_per_cpu_kthreads+0x7/0x60
>> 
>> but task is already holding lock:
>> ffffffff82c51af0 (cpu_hotplug_lock){++++}-{0:0}, at:
>> start_per_cpu_kthreads+0x28/0x140
>> 
>> other info that might help us debug this:
>>  Possible unsafe locking scenario:
>> 
>>        CPU0
>>        ----
>>   lock(cpu_hotplug_lock);
>>   lock(cpu_hotplug_lock);
>> 
>> Call Trace:
>>  <TASK>
>>  __lock_acquire+0x1612/0x29b0
>>  lock_acquire+0xd0/0x2e0
>>  cpus_read_lock+0x49/0x120
>>  stop_per_cpu_kthreads+0x7/0x60
>>  start_kthread+0x105/0x120
>>  start_per_cpu_kthreads+0xdd/0x140
>>  osnoise_workload_start+0x261/0x2f0
>>  osnoise_tracer_start+0x18/0x4
>> 
>> In start_kthread(), when kthread_run_on_cpu() fails,
>> cpus_read_unlock() should be called before stop_per_cpu_kthreads(),
>> but both start_per_cpu_kthreads() and start_kthread() call the error
>> handling routine stop_per_cpu_kthreads(),
>> which is redundant. Only one call is necessary.
>> To fix this, move stop_per_cpu_kthreads() outside of start_kthread(),
>> use the return value of start_kthread() to determine kthread creation
>> error.
>> The same issue exists in osnoise_hotplug_workfn() too.
>> 
>> Reviewed-by: Yang Guang <[email protected]>
>> Reviewed-by: Wang Yong <[email protected]>
>> Signed-off-by: Ran Xiaokai <[email protected]>
>> ---
>>  kernel/trace/trace_osnoise.c | 10 +++++++---
>>  1 file changed, 7 insertions(+), 3 deletions(-)
>> 
>> diff --git a/kernel/trace/trace_osnoise.c b/kernel/trace/trace_osnoise.c
>> index 92e16f03fa4e..38fb0c655f5b 100644
>> --- a/kernel/trace/trace_osnoise.c
>> +++ b/kernel/trace/trace_osnoise.c
>> @@ -2029,7 +2029,6 @@ static int start_kthread(unsigned int cpu)
>>  
>>      if (IS_ERR(kthread)) {
>>              pr_err(BANNER "could not start sampling thread\n");
>> -            stop_per_cpu_kthreads();
>>              return -ENOMEM;
>>      }
>>  
>> @@ -2097,7 +2096,7 @@ static void osnoise_hotplug_workfn(struct
>> work_struct *dummy)
>>              return;
>>  
>>      guard(mutex)(&interface_lock);
>> -    guard(cpus_read_lock)();
>> +    cpus_read_lock();
>>  
>>      if (!cpu_online(cpu))
>>              return;
>> @@ -2105,7 +2104,12 @@ static void osnoise_hotplug_workfn(struct
>> work_struct *dummy)
>>      if (!cpumask_test_cpu(cpu, &osnoise_cpumask))
>>              return;
>>  
>> -    start_kthread(cpu);
>> +    if (start_kthread(cpu)) {
>> +            cpus_read_unlock();
>> +            stop_per_cpu_kthreads();
>
>Is it right to call stop_per_cpu_kthreads() which stops osnoise kthread
>for every other CPUs in the system if a failure occurs during hotplug of a
>CPU?

I also suspect that this is not a rational behavior.

>On another note, since stop_per_cpu_kthreads() invokes stop_kthread()
>for every online CPU. It's better to remove stop_per_cpu_kthreads() from
>start_kthread() and handle the error in `osnoise_hotplug_workfn` 

Hi, Vishal
I did this in my first versin, something like this:

@@ -2097,7 +2096,7 @@ static void osnoise_hotplug_workfn(struct
work_struct *dummy)
                return;
 
        guard(mutex)(&interface_lock);
-       guard(cpus_read_lock)();
+       cpus_read_lock();
 
        if (!cpu_online(cpu)) {
+               cpus_read_unlock();
                return;
        }

        if (!cpumask_test_cpu(cpu, &osnoise_cpumask)) {
+               cpus_read_unlock();
                return;
        }
-       start_kthread(cpu);
+       if (start_kthread(cpu)) {
+               cpus_read_unlock();
+               stop_per_cpu_kthreads();
+               return;
+       }
+       cpus_read_unlock();
 }

We have to drop the guard() and call unlock() manually,
this somewhat makes the code redundant.

>       Vishal
>> +            return;
>> +    }
>> +    cpus_read_unlock();
>>  }
>>  
>>  static DECLARE_WORK(osnoise_hotplug_work, osnoise_hotplug_workfn);
>> -- 
>> 2.15.2
>> 


Reply via email to