Hi Steven,
On 2024/11/13 7:50, Steven Rostedt wrote:
>>>> Another "hung task" error was reported during the test, and i figured out
>>>> the deadlock scenario is as follows:
>>>>
>>>> T1 [BP] | T2 [AP] | T3 [hwlatd/1]
>>>> | T4
>>>> work_for_cpu_fn() | cpuhp_thread_fun() | kthread_fn()
>>>> | hwlat_hotplug_workfn()
>>>> _cpu_down() | stop_cpu_kthread() |
>>>> | mutex_lock(&hwlat_data.lock)
>>>> cpus_write_lock() | kthread_stop(hwlatd/1) |
>>>> mutex_lock(&hwlat_data.lock) |
>>>> __cpuhp_kick_ap() | wait_for_completion() |
>>>> | cpus_read_lock()
>
> So, if we can make T3 not take the mutex_lock then that should be a
> solution, right?
>
>>>>
>>>> It constitutes ABBA deadlock indirectly beAs it calls
>>>> msleep_interruptible() and 'break' if signal pending below, i choosed
>>>> 'break' here too.tween "cpu_hotplug_lock" and
>>>> "hwlat_data.lock", make the mutex obtaining in kthread_fn() interruptible
>>>> to fix this.
>>>>
>>>> Fixes: ba998f7d9531 ("trace/hwlat: Support hotplug operations")
>>>> Signed-off-by: Wei Li <[email protected]>
>>>> ---
>>>> kernel/trace/trace_hwlat.c | 3 ++-
>>>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/kernel/trace/trace_hwlat.c b/kernel/trace/trace_hwlat.c
>>>> index 3bd6071441ad..4c228ccb8a38 100644
>>>> --- a/kernel/trace/trace_hwlat.c
>>>> +++ b/kernel/trace/trace_hwlat.c
>>>> @@ -370,7 +370,8 @@ static int kthread_fn(void *data)
>>>> get_sample();
>>>> local_irq_enable();
>>>>
>>>> - mutex_lock(&hwlat_data.lock);
>>>> + if (mutex_lock_interruptible(&hwlat_data.lock))
>>>> + break;
>>>
>>> So basically this requires as signal to break it out of the loop?
>>>
>>> But if it receives a signal for any other reason, it breaks out of the loop
>>> too. Which is not what we want. If anything, it should be:
>>>
>>> if (mutex_lock_interruptible(&hwlat_data.lock))
>>> continue;
>>
>> As it calls msleep_interruptible() below and 'break' if signal pending, i
>> choosed 'break' here too.
>>
>>> But I still don't really like this solution, as it will still report a
>>> deadlock.
>>>
>>> Is it possible to switch the cpu_read_lock() to be taken before the
>>> hwlat_data.lock?
>>
>> It's a little hard to change the sequence of these two locks, we'll hold
>> "cpu_hotplug_lock" for longer unnecessarily if we do that.
>>
>> But maybe we can remove the "hwlat_data.lock" in kthread_fn(), let me try
>> another modification.
>
> Have you found something yet? Looking at the code we have:
>
> mutex_lock(&hwlat_data.lock);
> interval = hwlat_data.sample_window - hwlat_data.sample_width;
> mutex_unlock(&hwlat_data.lock);
>
> Where the lock is only there to synchronize the calculation of the
> interval. We could add a counter for when sample_window and sample_width
> are updated, and we could simply do:
>
> again:
> counter = atomic_read(&hwlat_data.counter);
> smp_rmb();
> if (!(counter & 1)) {
> new_interval = hwlat_data.sample_window -
> hwlat_data.sample_width;
> smp_rmb();
> if (counter == atomic_read(&hwlat_data.counter))
> interval = new_interval;
> }
>
> Then we could do something like:
>
> atomic_inc(&hwlat_data.counter);
> smp_wmb();
> /* update sample_window or sample_width */
> smp_wmb();
> atomic_inc(&hwlat_data.counter);
>
> And then the interval will only be updated if the values are not being
> updated. Otherwise it just keeps the previous value.
>
Your seqlock-like solution seems to be able to solve this issue, but the
difficulty is that
the current updates of sample_window and sample_width are implemented using the
framework
of 'trace_min_max_fops'. We cannot add 'atomic_inc(&hwlat_data.counter)' into
the update
processes for sample_window and sample_width directly. If we want to remove the
mutex_lock
here, maybe we need to break the application of trace_min_max_write(). However,
if we do
that, we can add a 'hwlat_data.sample_interval' and update it at the same time
as updating
sample_window and sample_width. I didn't figure out an elegant solution yet.
Thanks,
Wei