On 5/29/26 21:35, Andrey Drobyshev wrote:
> Saving or migrating a vhost-blk guest under disk load can fail to load on
> the destination:
> 
>   qemu-kvm: VQ 0 size 0x100 < last_avail_idx 0xb8ab - used_idx 0xb934
>   qemu-kvm: Failed to load vhost-blk:virtio
>   qemu-kvm: error while loading state for instance 0x0 of device vhost-blk
>   load of migration failed: Operation not permitted
> 
> virtio_load() rejects the device because the saved used_idx is ahead of
> last_avail_idx, which is impossible for a coherent vring.
> 
> The root cause is that vhost-blk has no "stop fetching" step before the
> device is stopped.  On stop, QEMU's vhost_dev_stop() reads last_avail_idx
> via VHOST_GET_VRING_BASE, but the vhost worker is still running: it keeps
> pulling the avail-ring backlog and completing those requests, advancing
> the guest used->idx past the last_avail_idx that was just sampled.  The
> saved state is therefore incoherent.
> 
> vhost-net does not hit this because it detaches the backend
> (VHOST_NET_SET_BACKEND, fd == -1) before VHOST_GET_VRING_BASE, so its
> worker stops fetching.  vhost-blk had no equivalent operation.
> 
> Teach VHOST_BLK_SET_BACKEND to treat a negative fd as "stop the device":
> detach the backend from every vq (vhost_blk_handle_guest_kick() bails on a
> NULL backend), drain in-flight requests with vhost_blk_flush(), and release
> the backing file.  After this the worker no longer advances the rings, so
> the subsequent VHOST_GET_VRING_BASE reports a final, coherent
> last_avail_idx.  The unconsumed avail backlog stays in the ring and is
> reprocessed once the device is restarted.  The companion QEMU change issues
> this stop before vhost_dev_stop().
> 
> https://virtuozzo.atlassian.net/browse/VSTOR-133464
> Fixes: 40a5928ec730 ("drivers/vhost: vhost-blk accelerator for virtio-blk 
> guests")
> Signed-off-by: Andrey Drobyshev <[email protected]>

Reviewed-by: Pavel Tikhomirov <[email protected]>

> ---
>  drivers/vhost/blk.c | 18 ++++++++++++++++++
>  1 file changed, 18 insertions(+)
> 
> diff --git a/drivers/vhost/blk.c b/drivers/vhost/blk.c
> index b11f08f878f4..1b073011c445 100644
> --- a/drivers/vhost/blk.c
> +++ b/drivers/vhost/blk.c
> @@ -744,6 +744,24 @@ static long vhost_blk_set_backend(struct vhost_blk *blk, 
> int fd)
>       if (ret)
>               goto out_dev;
>  
> +     /*
> +      * fd < 0 means "stop the device".  Detach the backend from every vq so
> +      * vhost_blk_handle_guest_kick() stops fetching descriptors, drain the
> +      * in-flight requests, and release the backing file.
> +      */
> +     if (fd < 0) {
> +             if (!blk->backend) {
> +                     mutex_unlock(&blk->dev.mutex);
> +                     return 0;               /* already stopped */
> +             }
> +             vhost_blk_drop_backends(blk);
> +             vhost_blk_flush(blk);
> +             fput(blk->backend);
> +             blk->backend = NULL;
> +             mutex_unlock(&blk->dev.mutex);
> +             return 0;
> +     }
> +
>       if (blk->backend) {
>               ret = -EBUSY;
>               goto out_dev;

-- 
Best regards, Pavel Tikhomirov
Senior Software Developer, Virtuozzo.

_______________________________________________
Devel mailing list
[email protected]
https://lists.openvz.org/mailman/listinfo/devel

Reply via email to