On 6/24/26 17:00, Denis V. Lunev wrote:
> On 6/24/26 16:55, David Hildenbrand (Arm) wrote:
>> On 6/24/26 16:08, Denis V. Lunev wrote:
>>> Commit 8bd2fa086a04 ("virtio: break and reset virtio devices on
>>> device_shutdown()") added a generic virtio bus .shutdown handler that
>>> breaks and resets every virtio device during device_shutdown(), i.e. on
>>> reboot and kexec.
>>>
>>> virtio_balloon provides no .shutdown of its own, so that generic path
>>> runs while the balloon's asynchronous work is still armed. Once the
>>> device has been broken, virtqueue_add_inbuf() in
>>> virtballoon_free_page_report() returns -EIO and trips its
>>> WARN_ON_ONCE(). On a kernel booted with panic_on_warn that turns an
>>> ordinary reboot, for example a kexec based upgrade, into a fatal panic
>>> in the middle of device_shutdown(), so the machine never reaches the
>>> new kernel.
>>>
>>> Relaxing that single WARN_ON_ONCE() would only hide the symptom: the
>>> inflate/deflate and OOM paths do not warn, they call
>>> wait_event(vb->acked, ...) and would instead block forever on a broken
>>> queue that can no longer complete. The device has to be quiesced, not
>>> just kept quiet.
>>>
>>> Add a .shutdown handler that quiesces the balloon via the shared
>>> virtballoon_quiesce() helper while the device is still alive, and only
>>> then breaks and resets it via virtio_device_shutdown(). Unlike
>>> virtballoon_remove() the balloon workqueue is not destroyed, as shutdown
>>> does not free the device and cancel_work_sync() together with stop_update
>>> already prevent any further work from being queued.
>>>
>>> Fixes: 8bd2fa086a04 ("virtio: break and reset virtio devices on
>>> device_shutdown()")
>>> Signed-off-by: Denis V. Lunev <[email protected]>
>>> ---
>>> drivers/virtio/virtio_balloon.c | 7 +++++++
>>> 1 file changed, 7 insertions(+)
>>>
>>> diff --git a/drivers/virtio/virtio_balloon.c
>>> b/drivers/virtio/virtio_balloon.c
>>> index 5b02d9191ac6..26fc3c40d5b2 100644
>>> --- a/drivers/virtio/virtio_balloon.c
>>> +++ b/drivers/virtio/virtio_balloon.c
>>> @@ -1137,6 +1137,12 @@ static void virtballoon_remove(struct virtio_device
>>> *vdev)
>>> kfree(vb);
>>> }
>>>
>>> +static void virtballoon_shutdown(struct virtio_device *vdev)
>>> +{
>>> + virtballoon_quiesce(vdev->priv);
>>> + virtio_device_shutdown(vdev);
>>> +}
>> I'm curious why virtio_gpu_shutdown() doesn't need that (did not look into
>> the
>> details).
>>
>> Reviewed-by: David Hildenbrand (Arm) <[email protected]>
>>
> I would spend more time with other drivers once we will
> done with this. I have strong candidate - virtio-mem.
Heh, I briefly checked and it should handle it better I think.
If virtqueue_add_sgs() fails, it propagates the error (-EIO?) back to the main
loop where we end up in
switch (rc) {
...
default:
/* Unknown error, mark as broken */
dev_err(&vm->vdev->dev, ...
vm->broken = true;
}
And just stop.
But I didn't actually look into the details.
--
Cheers,
David