On Tue, 24 Sep 2024 17:45:15 +0800
Wei Li <[email protected]> wrote:
> Another "hung task" error was reported during the test, and i figured out
> the deadlock scenario is as follows:
>
> T1 [BP] | T2 [AP] | T3 [hwlatd/1]
> | T4
> work_for_cpu_fn() | cpuhp_thread_fun() | kthread_fn()
> | hwlat_hotplug_workfn()
> _cpu_down() | stop_cpu_kthread() |
> | mutex_lock(&hwlat_data.lock)
> cpus_write_lock() | kthread_stop(hwlatd/1) |
> mutex_lock(&hwlat_data.lock) |
> __cpuhp_kick_ap() | wait_for_completion() |
> | cpus_read_lock()
>
> It constitutes ABBA deadlock indirectly between "cpu_hotplug_lock" and
> "hwlat_data.lock", make the mutex obtaining in kthread_fn() interruptible
> to fix this.
>
> Fixes: ba998f7d9531 ("trace/hwlat: Support hotplug operations")
> Signed-off-by: Wei Li <[email protected]>
> ---
> kernel/trace/trace_hwlat.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/trace/trace_hwlat.c b/kernel/trace/trace_hwlat.c
> index 3bd6071441ad..4c228ccb8a38 100644
> --- a/kernel/trace/trace_hwlat.c
> +++ b/kernel/trace/trace_hwlat.c
> @@ -370,7 +370,8 @@ static int kthread_fn(void *data)
> get_sample();
> local_irq_enable();
>
> - mutex_lock(&hwlat_data.lock);
> + if (mutex_lock_interruptible(&hwlat_data.lock))
> + break;
So basically this requires as signal to break it out of the loop?
But if it receives a signal for any other reason, it breaks out of the loop
too. Which is not what we want. If anything, it should be:
if (mutex_lock_interruptible(&hwlat_data.lock))
continue;
But I still don't really like this solution, as it will still report a
deadlock.
Is it possible to switch the cpu_read_lock() to be taken before the
hwlat_data.lock?
-- Steve
> interval = hwlat_data.sample_window - hwlat_data.sample_width;
> mutex_unlock(&hwlat_data.lock);
>