Hi Dmitry,
Thank you for your review. On 14/05/2026 00:54, Dmitry Osipenko wrote: > On 5/14/26 00:40, Dmitry Osipenko wrote: >> Hi, >> >> On 5/12/26 11:59, Ryosuke Yasuoka wrote: >>> virtio_gpu_queue_ctrl_sgs() and virtio_gpu_queue_cursor() use >>> wait_event() without timeout when waiting for virtqueue space. If the >>> host device stops processing commands, these waits block indefinitely. >>> Since callers may hold DRM locks, this can make the entire system >>> unresponsive. >>> >>> Replace wait_event() with wait_event_timeout() using a 5-second timeout, >>> consistent with the existing timeout pattern in the driver. On timeout, >>> clean up and return -ENODEV, following the same error path as >>> drm_dev_enter() failure. >>> >>> Reported-by: >>> syzbot+d6dd6f86d3aaf7eebe7406e45c1c6e549453f...@syzkaller.appspotmail.com >>> Closes: >>> https://syzkaller.appspot.com/bug?id=d6dd6f86d3aaf7eebe7406e45c1c6e549453f224 >>> Reported-by: >>> syzbot+908bd910da5dd79b88de4cf7baf376cc873a9...@syzkaller.appspotmail.com >>> Closes: >>> https://syzkaller.appspot.com/bug?id=908bd910da5dd79b88de4cf7baf376cc873a922e >>> Signed-off-by: Ryosuke Yasuoka <[email protected]> >>> --- >>> drivers/gpu/drm/virtio/virtgpu_vq.c | 20 ++++++++++++++++++-- >>> 1 file changed, 18 insertions(+), 2 deletions(-) >> >> If host stops processing commands, this is a problem on host side. Isn't it? Yes, it is. But the guest has no way to recover from this situation on its own. The wait_event{,_timeout}() is inside the device critical section between drm_dev_enter/exit(). Removing the device via sysfs and graceful shutdown call drm_dev_unplug(), which is blocked until the critical section completes, so they cannot proceed either. Also, IIUC the virtio-gpu device cannot be hot-unplugged from the host side. The only option left is a forced reboot. > It may be acceptable to have wait_event_timeout() in a loop, printing > warnings about unresponsive host. I considered this approach, but it does not solve the recovery problem described above. The guest would still be stuck in the loop with no way to remove the device or shut down gracefully. > Don't think we can assume that 5 seconds is enough to say that host is > busted, unless spec says so. > > There could be a driver module parameter, specifying the timeoout. This > will be acceptable as user takes responsibility for the special timeout > behaviour. Agreed. In v2, the default behavior is preserved (wait_event, wait indefinitely). A new 'timeout' module parameter (in seconds) lets the user specify when to give up. When set to a non-zero value, the driver returns -ENODEV after the specified duration with a warning message. Also, I'm considering proposing a host response timeout in the virtio-gpu specificaiton. If a spec-defined timeout is accepted in the future, the driver could use it as the default instead of relying on the module parameter. Best regards, Ryosuke

