On Thu, 2 Jul 2026 07:12:44 +0000 Ryosuke Yasuoka wrote:
>On 02/07/2026 07:43, Hillf Danton wrote:
>> On Wed, 1 Jul 2026 09:23:05 +0000 Ryosuke Yasuoka wrote:
>>>On 30/06/2026 16:46, Dmitry Osipenko wrote:
>>>> Could you please move drm_kms_helper_hotplug_event() to virtio_gpu_init(),
>>>> placing it after wait_event_timeout(display_info_pending)? This will avoid
>>>> additional work_struct that otherwise needs to be cancelled in
>>>> virtio_gpu_init() on the timeout.
>>>
>>>IIUC, moving the drm_kms_helper_hotplug_event() and _hpd_irq_event()
>>>into virtio_gpu_init() after wait_event_timeout() would not prevent the
>>>issue.
>>>
>>>Looking at the syzbot call traces[1][2], the deadlock occurs during
>>>drm_client_setup(), which runs after virtio_gpu_init() has already
>>>returned. The display_info_cb that triggers the deadlock is called from
>>>the dequeue worker while drm_client_register() holds clientlist_mutex.
>>>
>>>Thread A:
>>>virtio_gpu_probe()
>>> -> virtio_gpu_init()   // sends GET_DISPLAY_INFO and waits up to 5s
>>> -> drm_dev_register()
>>> -> drm_client_setup()   // deadlock happens HERE
>>>     -> drm_client_register()   // holds clientlist_mutex
>>>     ...
>>>     -> virtio_gpu_queue_fenced_ctrl_buffer()
>>>         -> wait_event() // waits for free space
>>>
>>>Thread B:
>>>virtio_gpu_dequeue_ctrl_func()
>>> -> reclaim_vbufs()   // make free space
>> 
>> I wonder if the deadlock could be cured by alternatively adding a wakeup
>> before the responce cb, given that free space is available.
>
>Thanks for your comment. I agree that moving wake_up() before response
>cb would help in most cases, and I think it is a reasonable aproach.
>However, there are a few rare scenarios where the deadlock can still
>occur:
>
>1. If reclaim_vbufs() does not free enough slots to satisfy
>num_free >= elemcnt, wake_up() fires but the wait_event() condition
>evaluates false. Thread A remains asleep holding clientlist_mutex, and
>the dequeue worker proceeds to display_info_cb which blocks on
>clientlist_mutex.

        task0           task1
        lock clientlist_mutex
        wait_event(for enough free slots)

                        free some slots
                        lock clientlist_mutex
                        unlock clientlist_mutex
                        wakeup
        deadlock

The clientlist_mutex will not be released without enough slots freed, so more
free slots and wakeup are needed to cure the deadlock, with nothing to do with
acquiring the mutex in the workqueue context.

Your workqueue approach works by removing the dependence of wakeup on acquiring
the mutex, without enough free slots guaranteed.

In short, wakeup before acquiring the mutex is the right thing to do, I mean
it is not the only one at all.

>2. Even if enough slots are freed, there is a small window between
>wait_event() returning and Thread A acquiring spin_lock(&qlock) at the
>"goto again" path in virtio_gpu_queue_ctrl_sgs(). Another thread can
>consume the freed slots during this window, causing Thread A to loop 
>back to wait_event() still holding clientlist_mutex.

Ditto

>3. Reordering wake_up() before resp_cb inverts the natural "process
>response, then signal completion" pattern, which could introduce subtle
>issues if a future resp_cb depends on side effects that should happen
>after wake_up(), I believe.
>
Different taste

>These scenarios are likely rare in practice, but the schedule_work()
>approach avoids all of them entirely by ensuring the dequeue worker
>never touches clientlist_mutex. It also follows the same pattern
>already used by config_changed_work in this driver.
>
>Best regards
>Ryosuke
>
>> 
>>> -> resp_cb()
>>>     -> virtio_gpu_cmd_get_display_info_cb
>>>         -> drm_helper_hpd_irq_event()
>>>         -> drm_kms_helper_hotplug_event()
>>>             -> drm_client_dev_hotplug() // need to lock clientlist_mutex
>>> -> wake_up()   // never reached
>>>
>>>IIUC the hotplug notification in display_info_cb is needed because it
>>>notifies DRM after updating by the callback with fresh data from the host.
>>>
>>>This work_struct ensures display_info_cb never blocks on
>>>clientlist_mutex in the dequeue worker, while preserving the hotplug
>>>notification with fresh data.
>>>
>>>[1] 
>>>https://syzkaller.appspot.com/bug?id=d6dd6f86d3aaf7eebe7406e45c1c6e549453f224
>>>[2] 
>>>https://syzkaller.appspot.com/bug?id=908bd910da5dd79b88de4cf7baf376cc873a922e
>>>
>>>Best regards,
>>>Ryosuke

Reply via email to