Re: [RFC PATCH v2 0/2] Virtio-GPU suspend and resume

Dmitry Osipenko Mon, 30 Jun 2025 19:00:14 -0700

On 6/18/25 01:41, Kim, Dongwon wrote:
...
>> Have you figured out why 10ms workaround is needed?
> 
> [Kim, Dongwon] Unfortunately, I don't know why it fails without the delay. I 
> wanted to narrow down further
> so enabled printk during suspend and resume but hang didn't occur with the 
> timing changes
> caused by printks.  I've also tried more deterministic methods that make it 
> wait based on some
> kinds of "status" but none of them have worked so far. If you have any 
> suggestions on possible
> condition we can check instead of just sleeping, please let me know.
> 10ms seems to be close to minimum to make it work 100% for several days 
> (rtcwake sleep and
> wake up every 5 sec).


Was able to reproduce the hang and got a crash backtrace with 
no_console_suspend:

[   63.824827] PM: suspend entry (deep)
[   63.825041] Filesystems sync: 0.000 seconds
[   63.990951] Freezing user space processes
[   63.992488] Freezing user space processes completed (elapsed 0.001 seconds)
[   63.992775] OOM killer disabled.
[   63.992902] Freezing remaining freezable tasks
[   63.994099] Freezing remaining freezable tasks completed (elapsed 0.001 
seconds)
[   64.002183] Oops: general protection fault, probably for non-canonical 
address 0x2abe0ea26847fb08: 0000 [#1] SMP NOPTI
[   64.003172] CPU: 9 UID: 0 PID: 178 Comm: kworker/9:2 Not tainted 
6.15.4-00002-g01117b4373b2-dirty #123 PREEMPT(voluntary) 
[   64.003614] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
[   64.004036] Workqueue: events virtio_gpu_dequeue_ctrl_func
[   64.004280] RIP: 0010:virtqueue_get_buf_ctx_split+0x86/0x130
[   64.004515] Code: 01 66 23 43 50 0f b7 c0 8b 74 c1 04 8b 44 c1 08 41 89 45 
00 3b 73 58 0f 83 96 d7 20 ff 89 f0 48 c1 e0 04 48 03 83 80 00 00 00 <4c> 8b 20 
4d 85 e4 0f 84 5a d7 20 ff 48 89 df e8 46 fc ff ff 0f b7
[   64.005227] RSP: 0018:ffffc90000b53d90 EFLAGS: 00010202
[   64.005430] RAX: 2abe0ea26847fb08 RBX: ffff888102d58a00 RCX: ffff8881255314c0
[   64.005698] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff888102d58a00
[   64.005975] RBP: ffffc90000b53db0 R08: 8080808080808080 R09: ffff88885b470b40
[   64.006273] R10: ffff8881000508c8 R11: fefefefefefefeff R12: 0000000000000001
[   64.006907] R13: ffffc90000b53dfc R14: ffffc90000b53dfc R15: ffff8881032d0568
[   64.007205] FS:  0000000000000000(0000) GS:ffff8888d6650000(0000) 
knlGS:0000000000000000
[   64.007511] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   64.007732] CR2: 00007efedc4d3000 CR3: 00000001056e9000 CR4: 0000000000750ef0
[   64.008014] PKRU: 55555554
[   64.008123] Call Trace:
[   64.008223]  <TASK>
[   64.008314]  virtqueue_get_buf+0x46/0x60
[   64.008465]  virtio_gpu_dequeue_ctrl_func+0x86/0x2a0
[   64.008655]  process_one_work+0x18a/0x370
[   64.008823]  worker_thread+0x31a/0x460
[   64.008971]  ? _raw_spin_unlock_irqrestore+0x27/0x50
[   64.009176]  ? srso_alias_return_thunk+0x5/0xfbef5
[   64.009369]  ? __pfx_worker_thread+0x10/0x10
[   64.009532]  kthread+0x126/0x230
[   64.009662]  ? _raw_spin_unlock_irq+0x1f/0x40
[   64.009836]  ? __pfx_kthread+0x10/0x10
[   64.009986]  ret_from_fork+0x3a/0x60
[   64.010156]  ? __pfx_kthread+0x10/0x10
[   64.010318]  ret_from_fork_asm+0x1a/0x30
[   64.010507]  </TASK>
[   64.010616] Modules linked in:
[   64.010785] ---[ end trace 0000000000000000 ]--- 

==

The trace tells that virtio queue is active after it has been removed. This 
change fixes the crash, please test:

diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.c 
b/drivers/gpu/drm/virtio/virtgpu_drv.c
index 03ab78b44ab3..48bb21f33306 100644
--- a/drivers/gpu/drm/virtio/virtgpu_drv.c
+++ b/drivers/gpu/drm/virtio/virtgpu_drv.c
@@ -187,6 +187,10 @@ static int virtgpu_freeze(struct virtio_device *vdev)
        flush_work(&vgdev->ctrlq.dequeue_work);
        flush_work(&vgdev->cursorq.dequeue_work);
        flush_work(&vgdev->config_changed_work);
+       wait_event(vgdev->ctrlq.ack_queue,
+                  vgdev->ctrlq.vq->num_free == vgdev->ctrlq.vq->num_max);
+       wait_event(vgdev->cursorq.ack_queue,
+                  vgdev->cursorq.vq->num_free == vgdev->cursorq.vq->num_max);
        vdev->config->del_vqs(vdev);
 
        return 0;

-- 
Best regards,
Dmitry

Re: [RFC PATCH v2 0/2] Virtio-GPU suspend and resume

Reply via email to