Li Chen wrote:
> Under heavy concurrent flush traffic, virtio-pmem can overflow its request
> virtqueue (req_vq): virtqueue_add_sgs() starts returning -ENOSPC and the
> driver logs "no free slots in the virtqueue". Shortly after that the
> device enters VIRTIO_CONFIG_S_NEEDS_RESET and flush requests fail with
> "virtio pmem device needs a reset".
> 
> Serialize virtio_pmem_flush() with a per-device mutex so only one flush
> request is in-flight at a time. This prevents req_vq descriptor overflow
> under high concurrency.
> 
> Reproducer (guest with virtio-pmem):
>   - mkfs.ext4 -F /dev/pmem0
>   - mount -t ext4 -o dax,noatime /dev/pmem0 /mnt/bench
>   - fio: ioengine=io_uring rw=randwrite bs=4k iodepth=64 numjobs=64
>         direct=1 fsync=1 runtime=30s time_based=1

I don't see this error.

<file>
13:28:50 > cat foo.fio 
# test http://lore.kernel.org/[email protected]

[global]
filename=/mnt/bench/foo
ioengine=io_uring
size=1G
bs=4K
iodepth=64
numjobs=64
direct=1
fsync=1
runtime=30s
time_based=1

[rand-write]
rw=randwrite
</file>

It's possible I'm doing something wrong.  Can you share your qemu cmdline
or more details on the bug yall see.

>   - dmesg: "no free slots in the virtqueue"
>            "virtio pmem device needs a reset"
> 
> Fixes: 6e84200c0a29 ("virtio-pmem: Add virtio pmem driver")
> Signed-off-by: Li Chen <[email protected]>
> ---
>  drivers/nvdimm/nd_virtio.c   | 15 +++++++++++----
>  drivers/nvdimm/virtio_pmem.c |  1 +
>  drivers/nvdimm/virtio_pmem.h |  4 ++++
>  3 files changed, 16 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/nvdimm/nd_virtio.c b/drivers/nvdimm/nd_virtio.c
> index c3f07be4aa22..827a17fe7c71 100644
> --- a/drivers/nvdimm/nd_virtio.c
> +++ b/drivers/nvdimm/nd_virtio.c
> @@ -44,19 +44,24 @@ static int virtio_pmem_flush(struct nd_region *nd_region)
>       unsigned long flags;
>       int err, err1;
>  
> +     might_sleep();
> +     mutex_lock(&vpmem->flush_lock);

Assuming this does fix a bug I'd rather use guard here.

        guard(mutex)(&vpmem->flush_lock);

Then skip all the gotos and out_unlock stuff.

Also, does this affect performance at all?

Ira

> +
>       /*
>        * Don't bother to submit the request to the device if the device is
>        * not activated.
>        */
>       if (vdev->config->get_status(vdev) & VIRTIO_CONFIG_S_NEEDS_RESET) {
>               dev_info(&vdev->dev, "virtio pmem device needs a reset\n");
> -             return -EIO;
> +             err = -EIO;
> +             goto out_unlock;
>       }
>  
> -     might_sleep();
>       req_data = kmalloc(sizeof(*req_data), GFP_KERNEL);
> -     if (!req_data)
> -             return -ENOMEM;
> +     if (!req_data) {
> +             err = -ENOMEM;
> +             goto out_unlock;
> +     }
>  
>       req_data->done = false;
>       init_waitqueue_head(&req_data->host_acked);
> @@ -103,6 +108,8 @@ static int virtio_pmem_flush(struct nd_region *nd_region)
>       }
>  
>       kfree(req_data);
> +out_unlock:
> +     mutex_unlock(&vpmem->flush_lock);
>       return err;
>  };

[snip]

Reply via email to