On 6/22/26 16:38, David Hildenbrand (Arm) wrote:
> This email originated from an IP that might not be authorized by the domain
> it was sent from.
> Do not click links or open attachments unless it is an email you expected to
> receive.
> On 6/22/26 15:37, Denis V. Lunev wrote:
>> Commit 8bd2fa086a04 ("virtio: break and reset virtio devices on
>> device_shutdown()") added a generic virtio bus .shutdown handler that
>> breaks and resets every virtio device during device_shutdown(), i.e. on
>> reboot and kexec.
>>
>> virtio_balloon provides no .shutdown of its own, so that generic path
>> runs while the balloon's asynchronous work is still armed. Once the
>> device has been broken, virtqueue_add_inbuf() in
>> virtballoon_free_page_report() returns -EIO and trips its
>> WARN_ON_ONCE(). On a kernel booted with panic_on_warn that turns an
>> ordinary reboot, for example a kexec based upgrade, into a fatal panic
>> in the middle of device_shutdown(), so the machine never reaches the
>> new kernel.
>>
>> Relaxing that single WARN_ON_ONCE() would only hide the symptom: the
>> inflate/deflate and OOM paths do not warn, they call
>> wait_event(vb->acked, ...) and would instead block forever on a broken
>> queue that can no longer complete. The device has to be quiesced, not
>> just kept quiet.
> Ah, so
>
> /* We should always be able to add one buffer to an empty queue. */
> virtqueue_add_outbuf(vq, &sg, 1, vb, GFP_KERNEL);
>
> is not actually correct.
>
> Yeah, quiescing sounds cleaner, although I am thinking whether we should also
> warn if virtqueue_add_outbuf() fails, similar to what we do in
> virtballoon_free_page_report().
Good catch, will do.,
>> Add a .shutdown handler that quiesces the balloon via the shared
>> virtballoon_quiesce() helper while the device is still alive, and only
>> then breaks and resets it. The break and reset are repeated here rather
>> than reused from virtio_dev_shutdown(): drv->shutdown replaces the
>> generic handler rather than augmenting it, so that drivers such as
>> virtio-gpu can opt out of the reset. Unlike virtballoon_remove() the
>> balloon workqueue is not destroyed, as shutdown does not free the
>> device and cancel_work_sync() together with stop_update already prevent
>> any further work from being queued.
>>
>> Fixes: 8bd2fa086a04 ("virtio: break and reset virtio devices on
>> device_shutdown()")
>> Signed-off-by: Denis V. Lunev <[email protected]>
>> ---
>> drivers/virtio/virtio_balloon.c | 10 ++++++++++
>> 1 file changed, 10 insertions(+)
>>
>> diff --git a/drivers/virtio/virtio_balloon.c
>> b/drivers/virtio/virtio_balloon.c
>> index 5b02d9191ac6..e35ada767b4b 100644
>> --- a/drivers/virtio/virtio_balloon.c
>> +++ b/drivers/virtio/virtio_balloon.c
>> @@ -1137,6 +1137,15 @@ static void virtballoon_remove(struct virtio_device
>> *vdev)
>> kfree(vb);
>> }
>>
>> +static void virtballoon_shutdown(struct virtio_device *vdev)
>> +{
>> + virtballoon_quiesce(vdev->priv);
>> +
>> + virtio_break_device(vdev);
>> + virtio_synchronize_cbs(vdev);
>> + vdev->config->reset(vdev);
> I guess it would be good if we wouldn't have to copy what the default handler
> does, but could instead just have it in a reusable core function?
Ok. Sounds great. Will do.
Thanks for review,
Den