On 2024/10/4 4:19, Steven Rostedt wrote:
> On Tue, 24 Sep 2024 17:45:15 +0800
> Wei Li <[email protected]> wrote:
> 
>> Another "hung task" error was reported during the test, and i figured out
>> the deadlock scenario is as follows:
>>
>> T1 [BP]               | T2 [AP]                     | T3 [hwlatd/1]          
>>         | T4
>> work_for_cpu_fn()     | cpuhp_thread_fun()          | kthread_fn()           
>>         | hwlat_hotplug_workfn()
>>   _cpu_down()         |   stop_cpu_kthread()        |                        
>>         |   mutex_lock(&hwlat_data.lock)
>>     cpus_write_lock() |     kthread_stop(hwlatd/1)  |   
>> mutex_lock(&hwlat_data.lock) |
>>     __cpuhp_kick_ap() |       wait_for_completion() |                        
>>         |   cpus_read_lock()
>>
>> It constitutes ABBA deadlock indirectly beAs it calls msleep_interruptible() 
>> and 'break' if signal pending below, i choosed 'break' here too.tween 
>> "cpu_hotplug_lock" and
>> "hwlat_data.lock", make the mutex obtaining in kthread_fn() interruptible
>> to fix this.
>>
>> Fixes: ba998f7d9531 ("trace/hwlat: Support hotplug operations")
>> Signed-off-by: Wei Li <[email protected]>
>> ---
>>  kernel/trace/trace_hwlat.c | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/kernel/trace/trace_hwlat.c b/kernel/trace/trace_hwlat.c
>> index 3bd6071441ad..4c228ccb8a38 100644
>> --- a/kernel/trace/trace_hwlat.c
>> +++ b/kernel/trace/trace_hwlat.c
>> @@ -370,7 +370,8 @@ static int kthread_fn(void *data)
>>              get_sample();
>>              local_irq_enable();
>>  
>> -            mutex_lock(&hwlat_data.lock);
>> +            if (mutex_lock_interruptible(&hwlat_data.lock))
>> +                    break;
> 
> So basically this requires as signal to break it out of the loop?
> 
> But if it receives a signal for any other reason, it breaks out of the loop
> too. Which is not what we want. If anything, it should be:
> 
>               if (mutex_lock_interruptible(&hwlat_data.lock))
>                       continue;

As it calls msleep_interruptible() below and 'break' if signal pending, i
choosed 'break' here too.

> But I still don't really like this solution, as it will still report a
>       deadlock.
> 
> Is it possible to switch the cpu_read_lock() to be taken before the
> hwlat_data.lock?

It's a little hard to change the sequence of these two locks, we'll hold
"cpu_hotplug_lock" for longer unnecessarily if we do that.

But maybe we can remove the "hwlat_data.lock" in kthread_fn(), let me try
another modification.


Thanks,
Wei

Reply via email to