On 02/07/2026 07:43, Hillf Danton wrote:
> On Wed, 1 Jul 2026 09:23:05 +0000 Ryosuke Yasuoka wrote:
>>On 30/06/2026 16:46, Dmitry Osipenko wrote:
>>> Could you please move drm_kms_helper_hotplug_event() to virtio_gpu_init(),
>>> placing it after wait_event_timeout(display_info_pending)? This will avoid
>>> additional work_struct that otherwise needs to be cancelled in
>>> virtio_gpu_init() on the timeout.
>>
>>IIUC, moving the drm_kms_helper_hotplug_event() and _hpd_irq_event()
>>into virtio_gpu_init() after wait_event_timeout() would not prevent the
>>issue.
>>
>>Looking at the syzbot call traces[1][2], the deadlock occurs during
>>drm_client_setup(), which runs after virtio_gpu_init() has already
>>returned. The display_info_cb that triggers the deadlock is called from
>>the dequeue worker while drm_client_register() holds clientlist_mutex.
>>
>>Thread A:
>>virtio_gpu_probe()
>> -> virtio_gpu_init()   // sends GET_DISPLAY_INFO and waits up to 5s
>> -> drm_dev_register()
>> -> drm_client_setup()   // deadlock happens HERE
>>     -> drm_client_register()   // holds clientlist_mutex
>>     ...
>>     -> virtio_gpu_queue_fenced_ctrl_buffer()
>>         -> wait_event() // waits for free space
>>
>>Thread B:
>>virtio_gpu_dequeue_ctrl_func()
>> -> reclaim_vbufs()   // make free space
> 
> I wonder if the deadlock could be cured by alternatively adding a wakeup
> before the responce cb, given that free space is available.

Thanks for your comment. I agree that moving wake_up() before response
cb would help in most cases, and I think it is a reasonable aproach.
However, there are a few rare scenarios where the deadlock can still
occur:

1. If reclaim_vbufs() does not free enough slots to satisfy
num_free >= elemcnt, wake_up() fires but the wait_event() condition
evaluates false. Thread A remains asleep holding clientlist_mutex, and
the dequeue worker proceeds to display_info_cb which blocks on
clientlist_mutex.
2. Even if enough slots are freed, there is a small window between
wait_event() returning and Thread A acquiring spin_lock(&qlock) at the
"goto again" path in virtio_gpu_queue_ctrl_sgs(). Another thread can
consume the freed slots during this window, causing Thread A to loop 
back to wait_event() still holding clientlist_mutex.
3. Reordering wake_up() before resp_cb inverts the natural "process
response, then signal completion" pattern, which could introduce subtle
issues if a future resp_cb depends on side effects that should happen
after wake_up(), I believe.

These scenarios are likely rare in practice, but the schedule_work()
approach avoids all of them entirely by ensuring the dequeue worker
never touches clientlist_mutex. It also follows the same pattern
already used by config_changed_work in this driver.

Best regards
Ryosuke

> 
>> -> resp_cb()
>>     -> virtio_gpu_cmd_get_display_info_cb
>>         -> drm_helper_hpd_irq_event()
>>         -> drm_kms_helper_hotplug_event()
>>             -> drm_client_dev_hotplug() // need to lock clientlist_mutex
>> -> wake_up()   // never reached
>>
>>IIUC the hotplug notification in display_info_cb is needed because it
>>notifies DRM after updating by the callback with fresh data from the host.
>>
>>This work_struct ensures display_info_cb never blocks on
>>clientlist_mutex in the dequeue worker, while preserving the hotplug
>>notification with fresh data.
>>
>>[1] 
>>https://syzkaller.appspot.com/bug?id=d6dd6f86d3aaf7eebe7406e45c1c6e549453f224
>>[2] 
>>https://syzkaller.appspot.com/bug?id=908bd910da5dd79b88de4cf7baf376cc873a922e
>>
>>Best regards,
>>Ryosuke


Reply via email to