hwlat: Fix deadlock in cpuhp processing

liwei (GF) Wed, 13 Nov 2024 18:07:21 -0800

Hi Steven,

On 2024/11/13 7:50, Steven Rostedt wrote:
>>>> Another "hung task" error was reported during the test, and i figured out
>>>> the deadlock scenario is as follows:
>>>>
>>>> T1 [BP]               | T2 [AP]                     | T3 [hwlatd/1]        
>>>>           | T4
>>>> work_for_cpu_fn()     | cpuhp_thread_fun()          | kthread_fn()         
>>>>           | hwlat_hotplug_workfn()
>>>>   _cpu_down()         |   stop_cpu_kthread()        |                      
>>>>           |   mutex_lock(&hwlat_data.lock)
>>>>     cpus_write_lock() |     kthread_stop(hwlatd/1)  |   
>>>> mutex_lock(&hwlat_data.lock) |
>>>>     __cpuhp_kick_ap() |       wait_for_completion() |                      
>>>>           |   cpus_read_lock()
> 
> So, if we can make T3 not take the mutex_lock then that should be a
> solution, right?
> 
>>>>
>>>> It constitutes ABBA deadlock indirectly beAs it calls 
>>>> msleep_interruptible() and 'break' if signal pending below, i choosed 
>>>> 'break' here too.tween "cpu_hotplug_lock" and
>>>> "hwlat_data.lock", make the mutex obtaining in kthread_fn() interruptible
>>>> to fix this.
>>>>
>>>> Fixes: ba998f7d9531 ("trace/hwlat: Support hotplug operations")
>>>> Signed-off-by: Wei Li <[email protected]>
>>>> ---
>>>>  kernel/trace/trace_hwlat.c | 3 ++-
>>>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/kernel/trace/trace_hwlat.c b/kernel/trace/trace_hwlat.c
>>>> index 3bd6071441ad..4c228ccb8a38 100644
>>>> --- a/kernel/trace/trace_hwlat.c
>>>> +++ b/kernel/trace/trace_hwlat.c
>>>> @@ -370,7 +370,8 @@ static int kthread_fn(void *data)
>>>>            get_sample();
>>>>            local_irq_enable();
>>>>  
>>>> -          mutex_lock(&hwlat_data.lock);
>>>> +          if (mutex_lock_interruptible(&hwlat_data.lock))
>>>> +                  break;  
>>>
>>> So basically this requires as signal to break it out of the loop?
>>>
>>> But if it receives a signal for any other reason, it breaks out of the loop
>>> too. Which is not what we want. If anything, it should be:
>>>
>>>             if (mutex_lock_interruptible(&hwlat_data.lock))
>>>                     continue;  
>>
>> As it calls msleep_interruptible() below and 'break' if signal pending, i
>> choosed 'break' here too.
>>
>>> But I still don't really like this solution, as it will still report a
>>>     deadlock.
>>>
>>> Is it possible to switch the cpu_read_lock() to be taken before the
>>> hwlat_data.lock?  
>>
>> It's a little hard to change the sequence of these two locks, we'll hold
>> "cpu_hotplug_lock" for longer unnecessarily if we do that.
>>
>> But maybe we can remove the "hwlat_data.lock" in kthread_fn(), let me try
>> another modification.
> 
> Have you found something yet? Looking at the code we have:
> 
>               mutex_lock(&hwlat_data.lock);
>               interval = hwlat_data.sample_window - hwlat_data.sample_width;
>               mutex_unlock(&hwlat_data.lock);
> 
> Where the lock is only there to synchronize the calculation of the
> interval. We could add a counter for when sample_window and sample_width
> are updated, and we could simply do:
> 
>  again:
>               counter = atomic_read(&hwlat_data.counter);
>               smp_rmb();
>               if (!(counter & 1)) {
>                       new_interval = hwlat_data.sample_window - 
> hwlat_data.sample_width;
>                       smp_rmb();
>                       if (counter == atomic_read(&hwlat_data.counter))
>                               interval = new_interval;
>               }
> 
> Then we could do something like:
> 
>       atomic_inc(&hwlat_data.counter);
>       smp_wmb();
>       /* update sample_window or sample_width */
>       smp_wmb();
>       atomic_inc(&hwlat_data.counter);
> 
> And then the interval will only be updated if the values are not being
> updated. Otherwise it just keeps the previous value.
>


Your seqlock-like solution seems to be able to solve this issue, but the 
difficulty is that
the current updates of sample_window and sample_width are implemented using the 
framework
of 'trace_min_max_fops'. We cannot add 'atomic_inc(&hwlat_data.counter)' into 
the update
processes for sample_window and sample_width directly. If we want to remove the 
mutex_lock
here, maybe we need to break the application of trace_min_max_write(). However, 
if we do
that, we can add a 'hwlat_data.sample_interval' and update it at the same time 
as updating
sample_window and sample_width. I didn't figure out an elegant solution yet.

Thanks,
Wei

Re: [PATCH 5/5] tracing/hwlat: Fix deadlock in cpuhp processing

Reply via email to