virtio_gpu_queue_ctrl_sgs() and virtio_gpu_queue_cursor() wait for virtqueue space using wait_event() with vqs_released as the only abort condition. This covers the device removal path, where virtio_gpu_release_vqs() sets the flag, but does not help when the host simply stops processing the virtqueue while the device remains present.
In that case, the virtqueue fills up and subsequent command submissions block indefinitely in D state with no diagnostic output, making the root cause difficult to identify. Replace the bare wait_event() with a wait_event_timeout() loop that prints a warning every 5 seconds while the virtqueue remains full. The wait still blocks indefinitely so driver behavior is unchanged. The warnings help identify an unresponsive host device during troubleshooting. Signed-off-by: Ryosuke Yasuoka <[email protected]> --- When the host stops processing the virtio-gpu virtqueue without triggering device removal, the bare wait_event() in virtio_gpu_queue_ctrl_sgs() and virtio_gpu_queue_cursor() blocks indefinitely with no diagnostic output. A DRM atomic commit worker blocks in virtio_gpu_queue_fenced_ctrl_buffer() while holding the modeset lock. During graceful shutdown, systemd (PID 1) needs the same lock — either by writing to the console via fbcon, or by closing a DRM file descriptor that triggers framebuffer cleanup — and blocks as well, making the VM unrecoverable without a forced power-off. PID: 553 COMMAND: "kworker/u4:3" #0 __schedule #1 schedule #2 virtio_gpu_queue_fenced_ctrl_buffer [virtio_gpu] #3 virtio_gpu_primary_plane_update [virtio_gpu] ... PID: 1 COMMAND: "systemd" (console write path) #0 __schedule #1 schedule #2 schedule_preempt_disabled #3 __ww_mutex_lock #4 drm_modeset_lock [drm] #5 drm_atomic_get_plane_state [drm] #6 drm_client_modeset_commit_atomic [drm] #7 drm_client_modeset_commit_locked [drm] #8 drm_fb_helper_pan_display [drm_kms_helper] #9 fb_pan_display #10 bit_update_start #11 fbcon_switch #12 redraw_screen ... Reproduction steps: 1. Build QEMU with the fault injection patch [1] that adds an x-ctrl-queue-broken property to virtio-gpu. 2. Boot the VM and trigger the fault injection from the host. 3. Fill the ctrlq (e.g., move the mouse on the guest's display). The process gets stuck in virtio_gpu_queue_fenced_ctrl_buffer() in D state. 4. Run a graceful shutdown command (shutdown now or reboot). 5. The shutdown process hangs. My earlier patch a46991b334f6 ("drm/virtio: abort virtqueue wait on device removal to avoid hung task") covers the case where the shutdown process reaches the device_shutdown() call path, which sets vqs_released to unblock the wait. However, during graceful shutdown, systemd (PID 1) gets stuck on the modeset lock before ever reaching device_shutdown(), so vqs_released is never set and the wait is never unblocked. I initially considered adding a module parameter to abort the wait with -ENODEV on timeout: +static unsigned int virtio_gpu_vq_timeout; +MODULE_PARM_DESC(vq_timeout, + "Timeout in seconds for virtqueue wait (0 = no timeout, default)"); +module_param_named(vq_timeout, virtio_gpu_vq_timeout, uint, 0444); ... + if (virtio_gpu_vq_timeout) { + if (!wait_event_timeout(vgdev->ctrlq.ack_queue, + vq->num_free >= elemcnt || + vgdev->vqs_released, + secs_to_jiffies(virtio_gpu_vq_timeout))) { + if (fence && vbuf->objs) + virtio_gpu_array_unlock_resv(vbuf->objs); + free_vbuf(vgdev, vbuf); + drm_dev_exit(idx); + return -ENODEV; + } + } else { + wait_event(vgdev->ctrlq.ack_queue, + vq->num_free >= elemcnt || + vgdev->vqs_released); + } This approach aborts the wait and allows the graceful shutdown process to eventually proceed, albeit with a delay. But that approach has drawbacks: it allows users to set arbitrarily short timeouts that could destabilize the driver, and aborting commands mid-flight is a rough recovery path. An unconditional timeout was also discussed previously [2] but is not appropriate without virtio specification support. This patch takes a safer approach: replace the bare wait_event() with wait_event_timeout() and print a warning every 5 seconds while the virtqueue remains full. The wait still blocks indefinitely and no commands are aborted, so driver behavior is unchanged. The warnings help identify an unresponsive host device during troubleshooting. Once the user notices the warning, they can work around the hang by unbinding the VT from fbcon, removing the device, or forcing a shutdown via SysRq. [1] https://gist.github.com/YsuOS/fbcd181752594af35f954953a1d260b8 [2] https://lore.kernel.org/all/[email protected]/ --- drivers/gpu/drm/virtio/virtgpu_vq.c | 18 ++++++++++++++---- 1 file changed, 14 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/virtio/virtgpu_vq.c b/drivers/gpu/drm/virtio/virtgpu_vq.c index 68d097ad9d1d..a546130d3b6a 100644 --- a/drivers/gpu/drm/virtio/virtgpu_vq.c +++ b/drivers/gpu/drm/virtio/virtgpu_vq.c @@ -410,8 +410,13 @@ static int virtio_gpu_queue_ctrl_sgs(struct virtio_gpu_device *vgdev, if (vq->num_free < elemcnt) { spin_unlock(&vgdev->ctrlq.qlock); virtio_gpu_notify(vgdev); - wait_event(vgdev->ctrlq.ack_queue, - vq->num_free >= elemcnt || vgdev->vqs_released); + while (!wait_event_timeout(vgdev->ctrlq.ack_queue, + vq->num_free >= elemcnt || + vgdev->vqs_released, + 5 * HZ) && !vgdev->vqs_released) + DRM_WARN("ctrlq waiting for host: no free space for %d secs\n", + 5); + /* * Set by virtio_gpu_release_vqs() to unblock * synchronize_srcu() wait in drm_dev_unplug(). @@ -592,8 +597,13 @@ static void virtio_gpu_queue_cursor(struct virtio_gpu_device *vgdev, ret = virtqueue_add_sgs(vq, sgs, outcnt, 0, vbuf, GFP_ATOMIC); if (ret == -ENOSPC) { spin_unlock(&vgdev->cursorq.qlock); - wait_event(vgdev->cursorq.ack_queue, - vq->num_free >= outcnt || vgdev->vqs_released); + while (!wait_event_timeout(vgdev->cursorq.ack_queue, + vq->num_free >= outcnt || + vgdev->vqs_released, + 5 * HZ) && !vgdev->vqs_released) + DRM_WARN("cursorq waiting for host: no free space for %d secs\n", + 5); + /* See comment in virtio_gpu_queue_ctrl_sgs(). */ if (vgdev->vqs_released) { free_vbuf(vgdev, vbuf); --- base-commit: b9e2d5cdaab05c997be3a69d9b372d7676683e1b change-id: 20260616-virtiogpu_add_timeout-33e985718c22 Best regards, -- Ryosuke Yasuoka <[email protected]>

