On 29/06/2026 15:11, Dmitry Osipenko wrote:
> Hi,
> 
> On 6/18/26 10:18, Ryosuke Yasuoka wrote:
>> virtio_gpu_queue_ctrl_sgs() and virtio_gpu_queue_cursor() wait for
>> virtqueue space using wait_event() with vqs_released as the only abort
>> condition. This covers the device removal path, where
>> virtio_gpu_release_vqs() sets the flag, but does not help when the host
>> simply stops processing the virtqueue while the device remains present.
>> 
>> In that case, the virtqueue fills up and subsequent command submissions
>> block indefinitely in D state with no diagnostic output, making the root
>> cause difficult to identify.
>> 
>> Replace the bare wait_event() with a wait_event_timeout() loop that
>> prints a warning every 5 seconds while the virtqueue remains full. The
>> wait still blocks indefinitely so driver behavior is unchanged. The
>> warnings help identify an unresponsive host device during
>> troubleshooting.
>> 
>> Signed-off-by: Ryosuke Yasuoka <[email protected]>
>> ---
>> When the host stops processing the virtio-gpu virtqueue without
>> triggering device removal, the bare wait_event() in
>> virtio_gpu_queue_ctrl_sgs() and virtio_gpu_queue_cursor() blocks
>> indefinitely with no diagnostic output. A DRM atomic commit worker
>> blocks in virtio_gpu_queue_fenced_ctrl_buffer() while holding the
>> modeset lock. During graceful shutdown, systemd (PID 1) needs the same
>> lock — either by writing to the console via fbcon, or by closing a DRM
>> file descriptor that triggers framebuffer cleanup — and blocks as well,
>> making the VM unrecoverable without a forced power-off.
>> 
>>   PID: 553    COMMAND: "kworker/u4:3"
>>    #0 __schedule
>>    #1 schedule
>>    #2 virtio_gpu_queue_fenced_ctrl_buffer [virtio_gpu]
>>    #3 virtio_gpu_primary_plane_update [virtio_gpu]
>>    ...
>> 
>>   PID: 1      COMMAND: "systemd"  (console write path)
>>    #0 __schedule
>>    #1 schedule
>>    #2 schedule_preempt_disabled
>>    #3 __ww_mutex_lock
>>    #4 drm_modeset_lock [drm]
>>    #5 drm_atomic_get_plane_state [drm]
>>    #6 drm_client_modeset_commit_atomic [drm]
>>    #7 drm_client_modeset_commit_locked [drm]
>>    #8 drm_fb_helper_pan_display [drm_kms_helper]
>>    #9 fb_pan_display
>>   #10 bit_update_start
>>   #11 fbcon_switch
>>   #12 redraw_screen
>>    ...
>> 
>> Reproduction steps:
>> 1. Build QEMU with the fault injection patch [1] that adds an
>>    x-ctrl-queue-broken property to virtio-gpu.
>> 2. Boot the VM and trigger the fault injection from the host.
>> 3. Fill the ctrlq (e.g., move the mouse on the guest's display).
>>    The process gets stuck in virtio_gpu_queue_fenced_ctrl_buffer()
>>    in D state.
>> 4. Run a graceful shutdown command (shutdown now or reboot).
>> 5. The shutdown process hangs.
>> 
>> My earlier patch a46991b334f6 ("drm/virtio: abort virtqueue wait on
>> device removal to avoid hung task") covers the case where the shutdown
>> process reaches the device_shutdown() call path, which sets vqs_released
>> to unblock the wait. However, during graceful shutdown, systemd (PID 1)
>> gets stuck on the modeset lock before ever reaching device_shutdown(),
>> so vqs_released is never set and the wait is never unblocked.
>> 
>> I initially considered adding a module parameter to abort the wait with
>> -ENODEV on timeout:
>> 
>>   +static unsigned int virtio_gpu_vq_timeout;
>>   +MODULE_PARM_DESC(vq_timeout,
>>   +     "Timeout in seconds for virtqueue wait (0 = no timeout, default)");
>>   +module_param_named(vq_timeout, virtio_gpu_vq_timeout, uint, 0444);
>>   ...
>>   +             if (virtio_gpu_vq_timeout) {
>>   +                     if (!wait_event_timeout(vgdev->ctrlq.ack_queue,
>>   +                                             vq->num_free >= elemcnt ||
>>   +                                             vgdev->vqs_released,
>>   +                                             
>> secs_to_jiffies(virtio_gpu_vq_timeout))) {
>>   +                             if (fence && vbuf->objs)
>>   +                                     
>> virtio_gpu_array_unlock_resv(vbuf->objs);
>>   +                             free_vbuf(vgdev, vbuf);
>>   +                             drm_dev_exit(idx);
>>   +                             return -ENODEV;
>>   +                     }
>>   +             } else {
>>   +                     wait_event(vgdev->ctrlq.ack_queue,
>>   +                                vq->num_free >= elemcnt ||
>>   +                                vgdev->vqs_released);
>>   +             }
>> 
>> This approach aborts the wait and allows the graceful shutdown process
>> to eventually proceed, albeit with a delay.
>> 
>> But that approach has drawbacks: it allows users to set arbitrarily
>> short timeouts that could destabilize the driver, and aborting commands
>> mid-flight is a rough recovery path. An unconditional timeout was also
>> discussed previously [2] but is not appropriate without virtio
>> specification support.
>> 
>> This patch takes a safer approach: replace the bare wait_event() with
>> wait_event_timeout() and print a warning every 5 seconds while the
>> virtqueue remains full. The wait still blocks indefinitely and no
>> commands are aborted, so driver behavior is unchanged. The warnings
>> help identify an unresponsive host device during troubleshooting.
>> Once the user notices the warning, they can work around the hang by
>> unbinding the VT from fbcon, removing the device, or forcing a shutdown
>> via SysRq.
>> 
>> [1] https://gist.github.com/YsuOS/fbcd181752594af35f954953a1d260b8
>> [2] 
>> https://lore.kernel.org/all/[email protected]/
>> ---
>>  drivers/gpu/drm/virtio/virtgpu_vq.c | 18 ++++++++++++++----
>>  1 file changed, 14 insertions(+), 4 deletions(-)
>> 
>> diff --git a/drivers/gpu/drm/virtio/virtgpu_vq.c 
>> b/drivers/gpu/drm/virtio/virtgpu_vq.c
>> index 68d097ad9d1d..a546130d3b6a 100644
>> --- a/drivers/gpu/drm/virtio/virtgpu_vq.c
>> +++ b/drivers/gpu/drm/virtio/virtgpu_vq.c
>> @@ -410,8 +410,13 @@ static int virtio_gpu_queue_ctrl_sgs(struct 
>> virtio_gpu_device *vgdev,
>>      if (vq->num_free < elemcnt) {
>>              spin_unlock(&vgdev->ctrlq.qlock);
>>              virtio_gpu_notify(vgdev);
>> -            wait_event(vgdev->ctrlq.ack_queue,
>> -                       vq->num_free >= elemcnt || vgdev->vqs_released);
>> +            while (!wait_event_timeout(vgdev->ctrlq.ack_queue,
>> +                                       vq->num_free >= elemcnt ||
>> +                                       vgdev->vqs_released,
>> +                                       5 * HZ) && !vgdev->vqs_released)
>> +                    DRM_WARN("ctrlq waiting for host: no free space for %d 
>> secs\n",
>> +                             5);
>> +
>>              /*
>>               * Set by virtio_gpu_release_vqs() to unblock
>>               * synchronize_srcu() wait in drm_dev_unplug().
>> @@ -592,8 +597,13 @@ static void virtio_gpu_queue_cursor(struct 
>> virtio_gpu_device *vgdev,
>>      ret = virtqueue_add_sgs(vq, sgs, outcnt, 0, vbuf, GFP_ATOMIC);
>>      if (ret == -ENOSPC) {
>>              spin_unlock(&vgdev->cursorq.qlock);
>> -            wait_event(vgdev->cursorq.ack_queue,
>> -                       vq->num_free >= outcnt || vgdev->vqs_released);
>> +            while (!wait_event_timeout(vgdev->cursorq.ack_queue,
>> +                                       vq->num_free >= outcnt ||
>> +                                       vgdev->vqs_released,
>> +                                       5 * HZ) && !vgdev->vqs_released)
>> +                    DRM_WARN("cursorq waiting for host: no free space for 
>> %d secs\n",
>> +                             5);
>> +
>>              /* See comment in virtio_gpu_queue_ctrl_sgs(). */
>>              if (vgdev->vqs_released) {
>>                      free_vbuf(vgdev, vbuf);
> 
> The [1] has a valid point about the hangcheck warning. Is
> wait_event_timeout() worth the change then?
> 
> [1]
> https://sashiko.dev/#/patchset/20260618-virtiogpu_add_timeout-v1-1-dc36cef609d9%40redhat.com

Thanks for pointing this out.

You're right — the wait_event_timeout() loop wakes the task every
5 seconds, resetting the context switch counter each time, so
khungtaskd will never detect these tasks as hung.

I originally thought the earlier, virtio-specific warning would be
more useful for troubleshooting than the generic hung task output.
But reconsidering, the practical workflow for the user doesn't
really change either way: the device gets stuck, the user sees a
diagnostic message — whether that's the hung task warning or my
proposed DRM_WARN — and then manually intervenes with SysRq or a
forced shutdown.

The difference is what diagnostic information is available. The
hung task warning includes a full stack trace pointing directly at
the virtqueue wait, which is more useful for identifying the root
cause than a short warning line. Suppressing that by resetting the
context switch counter every 5 seconds makes the diagnostics
strictly worse.

More fundamentally, the guest can't recover from this situation
regardless — the root cause is an unresponsive host, and the fix
has to happen on the host side. The existing wait_event() with the
kernel's built-in hung task detection is the better behavior here.

I'll drop this patch.

Best regards,
Ryosuke


Reply via email to