Thank you for your contribution! Sashiko AI review found 4 potential issue(s) 
to consider:

New issues:
- [High] Replacing `wait_event()` with a 5-second `wait_event_timeout()` loop 
defeats the kernel's hung task detector for this wait, breaking automated 
recovery mechanisms.
- [Medium] The `wait_event_timeout()` loop will generate severe log spam if 
multiple userspace threads block concurrently.

Pre-existing issues:
- [High] The `virtio_gpu_object_array` (`vbuf->objs`) and its contained GEM 
objects are leaked on error paths where `virtio_gpu_queue_ctrl_sgs()` fails to 
queue the buffer.
- [High] The virtio-gpu driver sleeps in plane atomic update callbacks, 
violating DRM subsystem guidelines against sleeping in atomic context.
--

commit 736ae781cca53398969e63f0d5a23d6e3f738db0
Author: Ryosuke Yasuoka <[email protected]>

drm/virtio: warn when virtqueue has no free space for too long

This commit replaces the bare wait_event() in virtqueue command submission
with a wait_event_timeout() loop. The loop prints a warning every 5 seconds
while the virtqueue remains full to help identify unresponsive host devices.

> diff --git a/drivers/gpu/drm/virtio/virtgpu_vq.c 
> b/drivers/gpu/drm/virtio/virtgpu_vq.c
> index 68d097ad9d1d0..a546130d3b6a5 100644
> --- a/drivers/gpu/drm/virtio/virtgpu_vq.c
> +++ b/drivers/gpu/drm/virtio/virtgpu_vq.c
> @@ -410,8 +410,13 @@ static int virtio_gpu_queue_ctrl_sgs(struct 
> virtio_gpu_device *vgdev,
>       if (vq->num_free < elemcnt) {
>               spin_unlock(&vgdev->ctrlq.qlock);
>               virtio_gpu_notify(vgdev);
> -             wait_event(vgdev->ctrlq.ack_queue,
> -                        vq->num_free >= elemcnt || vgdev->vqs_released);
> +             while (!wait_event_timeout(vgdev->ctrlq.ack_queue,
> +                                        vq->num_free >= elemcnt ||
> +                                        vgdev->vqs_released,
> +                                        5 * HZ) && !vgdev->vqs_released)

[Severity: High]
Does this 5-second polling loop evade the kernel's hung task detector?

The kernel's khungtaskd detects tasks stuck in TASK_UNINTERRUPTIBLE by
verifying their context switch count does not change over the configured
timeout. By waking the task up every 5 seconds, it will continually context
switch, causing khungtaskd to see a changing context switch count. Will this
prevent the system from ever flagging these tasks as hung or triggering
automatic recovery panics?

> +                     DRM_WARN("ctrlq waiting for host: no free space for %d 
> secs\n",
> +                              5);

[Severity: Medium]
Could this cause severe kernel log spam if multiple userspace threads are
blocked concurrently?

Since the 5-second timeout and warning are local to each thread's wait loop,
if multiple threads attempt to submit commands while the host is unresponsive,
will each thread emit a DRM_WARN message every 5 seconds?

> +
>               /*
>                * Set by virtio_gpu_release_vqs() to unblock
>                * synchronize_srcu() wait in drm_dev_unplug().

[Severity: High]
This is a pre-existing issue, but does this error path permanently leak the
virtio_gpu_object_array and its pinned GEM objects?

When vgdev->vqs_released is true, the code below this comment calls
free_vbuf(vgdev, vbuf) and returns -ENODEV. However, free_vbuf() only frees
the buffer and does not unreference or free vbuf->objs. Since the buffer was
never queued, the completion callback that normally cleans up vbuf->objs will
never run.

[ ... ]
> @@ -592,8 +597,13 @@ static void virtio_gpu_queue_cursor(struct 
> virtio_gpu_device *vgdev,
>       ret = virtqueue_add_sgs(vq, sgs, outcnt, 0, vbuf, GFP_ATOMIC);
>       if (ret == -ENOSPC) {
>               spin_unlock(&vgdev->cursorq.qlock);
> -             wait_event(vgdev->cursorq.ack_queue,
> -                        vq->num_free >= outcnt || vgdev->vqs_released);
> +             while (!wait_event_timeout(vgdev->cursorq.ack_queue,
> +                                        vq->num_free >= outcnt ||
> +                                        vgdev->vqs_released,
> +                                        5 * HZ) && !vgdev->vqs_released)

[Severity: High]
This isn't a bug introduced by this patch, but does waiting here violate DRM
subsystem atomic context rules?

This function is reached via the plane atomic update callback:

virtio_gpu_cursor_plane_update() ->
  virtio_gpu_cursor_ping() ->
    virtio_gpu_queue_cursor()

DRM plane atomic update callbacks execute in atomic context where sleeping
functions are forbidden. Using wait_event_timeout() (or previously
wait_event()) sets the task state to TASK_UNINTERRUPTIBLE and schedules out.
Could this scheduling while atomic cause system instability?

> +                     DRM_WARN("cursorq waiting for host: no free space for 
> %d secs\n",
> +                              5);
> +
>               /* See comment in virtio_gpu_queue_ctrl_sgs(). */
>               if (vgdev->vqs_released) {
>                       free_vbuf(vgdev, vbuf);

-- 
Sashiko AI review ยท 
https://sashiko.dev/#/patchset/[email protected]?part=1

Reply via email to