On Thu, Jan 15, 2026 at 2:49 AM Eugenio Pérez <[email protected]> wrote: > > Add support for assigning Address Space Identifiers (ASIDs) to each VQ > group. This enables mapping each group into a distinct memory space. > > The vq group to ASID association is protected by a rwlock now. But the > mutex domain_lock keeps protecting the domains of all ASIDs, as some > operations like the one related with the bounce buffer size still > requires to lock all the ASIDs. > > Signed-off-by: Eugenio Pérez <[email protected]> > > --- > Future improvements can include performance optimizations on top like > ore to RCU or thread synchronized atomics, or hardening by tracking ASID > or ASID hashes on unused bits of the DMA address. > > Tested virtio_vdpa by adding manually two threads in vduse_set_status: > one of them modifies the vq group 0 ASID and the other one map and unmap > memory continuously. After a while, the two threads stop and the usual > work continues. Test with version 0, version 1 with the old ioctl, and > version 1 with the new ioctl. > > Tested with vhost_vdpa by migrating a VM while ping on OVS+VDUSE. A few > workaround were needed in some parts: > * Do not enable CVQ before data vqs in QEMU, as VDUSE does not forward > the enable message to the userland device. This will be solved in the > future. > * Share the suspended state between all vhost devices in QEMU: > https://lists.nongnu.org/archive/html/qemu-devel/2025-11/msg02947.html > * Implement a fake VDUSE suspend vdpa operation callback that always > returns true in the kernel. DPDK suspend the device at the first > GET_VRING_BASE. > * Remove the CVQ blocker in ASID. > > The driver vhost_vdpa was also tested with version 0, version 1 with the > old ioctl, version 1 with the new ioctl but only one ASID, and version 1 > with many ASID. > > --- > v12: > * Using scoped guards for vq group rwlock, so the one queue optimization > is not missed (Jason proposed to factor them into helpers). > * Add the _v2 suffix to vduse_iova_range_v2 struct name fixing the doc > (MST). > * s/verion/version/ in patch message. > * Remove trailing ; after a comment (Jason). > > v11: > * Remove duplicated free_pages_exact in vduse_domain_free_coherent > (Jason). > * Do not take the vq groups lock if nas == 1. > * Do not reset the vq group ASID in vq reset (Jason). Removed extra > function vduse_set_group_asid_nomsg, not needed anymore. > * Move the vduse_iotlb_entry_v2 argument to a new ioctl, as argument > didn't match the previous VDUSE_IOTLB_GET_FD. > * Move the asid < dev->nas check to vdpa core. > > v10: > * Back to rwlock version so stronger locks are used. > * Take out allocations from rwlock. > * Forbid changing ASID of a vq group after DRIVER_OK (Jason) > * Remove bad fetching again of domain variable in > vduse_dev_max_mapping_size (Yongji). > * Remove unused vdev definition in vdpa map_ops callbacks (kernel test > robot). > > v9: > * Replace mutex with rwlock, as the vdpa map_ops can run from atomic > context. > > v8: > * Revert the mutex to rwlock change, it needs proper profiling to > justify it. > > v7: > * Take write lock in the error path (Jason). > > v6: > * Make vdpa_dev_add use gotos for error handling (MST). > * s/(dev->api_version < 1) ?/(dev->api_version < VDUSE_API_VERSION_1) ?/ > (MST). > * Fix struct name not matching in the doc. > > v5: > * Properly return errno if copy_to_user returns >0 in VDUSE_IOTLB_GET_FD > ioctl (Jason). > * Properly set domain bounce size to divide equally between nas (Jason). > * Exclude "padding" member from the only >V1 members in > vduse_dev_request. > > v4: > * Divide each domain bounce size between the device bounce size (Jason). > * revert unneeded addr = NULL assignment (Jason) > * Change if (x && (y || z)) return to if (x) { if (y) return; if (z) > return; } (Jason) > * Change a bad multiline comment, using @ caracter instead of * (Jason). > * Consider config->nas == 0 as a fail (Jason). > > v3: > * Get the vduse domain through the vduse_as in the map functions > (Jason). > * Squash with the patch creating the vduse_as struct (Jason). > * Create VDUSE_DEV_MAX_AS instead of comparing agains a magic number > (Jason) > > v2: > * Convert the use of mutex to rwlock. > > RFC v3: > * Increase VDUSE_MAX_VQ_GROUPS to 0xffff (Jason). It was set to a lower > value to reduce memory consumption, but vqs are already limited to > that value and userspace VDUSE is able to allocate that many vqs. > * Remove TODO about merging VDUSE_IOTLB_GET_FD ioctl with > VDUSE_IOTLB_GET_INFO. > * Use of array_index_nospec in VDUSE device ioctls. > * Embed vduse_iotlb_entry into vduse_iotlb_entry_v2. > * Move the umem mutex to asid struct so there is no contention between > ASIDs. > > RFC v2: > * Make iotlb entry the last one of vduse_iotlb_entry_v2 so the first > part of the struct is the same. > --- > drivers/vdpa/vdpa_user/vduse_dev.c | 385 +++++++++++++++++++---------- > include/uapi/linux/vduse.h | 63 ++++- > 2 files changed, 312 insertions(+), 136 deletions(-) > > diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c > b/drivers/vdpa/vdpa_user/vduse_dev.c > index d658f3e1cebf..2727c0c26003 100644 > --- a/drivers/vdpa/vdpa_user/vduse_dev.c > +++ b/drivers/vdpa/vdpa_user/vduse_dev.c > @@ -9,6 +9,7 @@ > */ > > #include "linux/virtio_net.h" > +#include <linux/cleanup.h> > #include <linux/init.h> > #include <linux/module.h> > #include <linux/cdev.h> > @@ -41,6 +42,7 @@ > > #define VDUSE_DEV_MAX (1U << MINORBITS) > #define VDUSE_DEV_MAX_GROUPS 0xffff > +#define VDUSE_DEV_MAX_AS 0xffff > #define VDUSE_MAX_BOUNCE_SIZE (1024 * 1024 * 1024) > #define VDUSE_MIN_BOUNCE_SIZE (1024 * 1024) > #define VDUSE_BOUNCE_SIZE (64 * 1024 * 1024) > @@ -86,7 +88,15 @@ struct vduse_umem { > struct mm_struct *mm; > }; > > +struct vduse_as { > + struct vduse_iova_domain *domain; > + struct vduse_umem *umem; > + struct mutex mem_lock; > +}; > + > struct vduse_vq_group { > + rwlock_t as_lock; > + struct vduse_as *as; /* Protected by as_lock */ > struct vduse_dev *dev; > }; > > @@ -94,7 +104,7 @@ struct vduse_dev { > struct vduse_vdpa *vdev; > struct device *dev; > struct vduse_virtqueue **vqs; > - struct vduse_iova_domain *domain; > + struct vduse_as *as; > char *name; > struct mutex lock; > spinlock_t msg_lock; > @@ -122,9 +132,8 @@ struct vduse_dev { > u32 vq_num; > u32 vq_align; > u32 ngroups; > - struct vduse_umem *umem; > + u32 nas; > struct vduse_vq_group *groups; > - struct mutex mem_lock; > unsigned int bounce_size; > struct mutex domain_lock; > }; > @@ -314,7 +323,7 @@ static int vduse_dev_set_status(struct vduse_dev *dev, u8 > status) > return vduse_dev_msg_sync(dev, &msg); > } > > -static int vduse_dev_update_iotlb(struct vduse_dev *dev, > +static int vduse_dev_update_iotlb(struct vduse_dev *dev, u32 asid, > u64 start, u64 last) > { > struct vduse_dev_msg msg = { 0 }; > @@ -323,8 +332,14 @@ static int vduse_dev_update_iotlb(struct vduse_dev *dev, > return -EINVAL; > > msg.req.type = VDUSE_UPDATE_IOTLB; > - msg.req.iova.start = start; > - msg.req.iova.last = last; > + if (dev->api_version < VDUSE_API_VERSION_1) { > + msg.req.iova.start = start; > + msg.req.iova.last = last; > + } else { > + msg.req.iova_v2.start = start; > + msg.req.iova_v2.last = last; > + msg.req.iova_v2.asid = asid; > + } > > return vduse_dev_msg_sync(dev, &msg); > } > @@ -439,11 +454,14 @@ static __poll_t vduse_dev_poll(struct file *file, > poll_table *wait) > static void vduse_dev_reset(struct vduse_dev *dev) > { > int i; > - struct vduse_iova_domain *domain = dev->domain; > > /* The coherent mappings are handled in vduse_dev_free_coherent() */ > - if (domain && domain->bounce_map) > - vduse_domain_reset_bounce_map(domain); > + for (i = 0; i < dev->nas; i++) { > + struct vduse_iova_domain *domain = dev->as[i].domain; > + > + if (domain && domain->bounce_map) > + vduse_domain_reset_bounce_map(domain); > + } > > down_write(&dev->rwsem); > > @@ -622,6 +640,42 @@ static union virtio_map vduse_get_vq_map(struct > vdpa_device *vdpa, u16 idx) > return ret; > } > > +DEFINE_GUARD(vq_group_as_read_lock, struct vduse_vq_group *, > + if (_T->dev->nas > 1) > + read_lock(&_T->as_lock), > + if (_T->dev->nas > 1) > + read_unlock(&_T->as_lock)) > + > +DEFINE_GUARD(vq_group_as_write_lock, struct vduse_vq_group *, > + if (_T->dev->nas > 1) > + write_lock(&_T->as_lock), > + if (_T->dev->nas > 1) > + write_unlock(&_T->as_lock)) > + > +static int vduse_set_group_asid(struct vdpa_device *vdpa, unsigned int group, > + unsigned int asid) > +{ > + struct vduse_dev *dev = vdpa_to_vduse(vdpa); > + struct vduse_dev_msg msg = { 0 }; > + int r; > + > + if (dev->api_version < VDUSE_API_VERSION_1) > + return -EINVAL; > + > + msg.req.type = VDUSE_SET_VQ_GROUP_ASID; > + msg.req.vq_group_asid.group = group; > + msg.req.vq_group_asid.asid = asid; > + > + r = vduse_dev_msg_sync(dev, &msg); > + if (r < 0) > + return r; > + > + guard(vq_group_as_write_lock)(&dev->groups[group]); > + dev->groups[group].as = &dev->as[asid]; > + > + return 0; > +} > + > static int vduse_vdpa_get_vq_state(struct vdpa_device *vdpa, u16 idx, > struct vdpa_vq_state *state) > { > @@ -793,13 +847,13 @@ static int vduse_vdpa_set_map(struct vdpa_device *vdpa, > struct vduse_dev *dev = vdpa_to_vduse(vdpa); > int ret; > > - ret = vduse_domain_set_map(dev->domain, iotlb); > + ret = vduse_domain_set_map(dev->as[asid].domain, iotlb); > if (ret) > return ret; > > - ret = vduse_dev_update_iotlb(dev, 0ULL, ULLONG_MAX); > + ret = vduse_dev_update_iotlb(dev, asid, 0ULL, ULLONG_MAX); > if (ret) { > - vduse_domain_clear_map(dev->domain, iotlb); > + vduse_domain_clear_map(dev->as[asid].domain, iotlb); > return ret; > } > > @@ -842,6 +896,7 @@ static const struct vdpa_config_ops vduse_vdpa_config_ops > = { > .get_vq_affinity = vduse_vdpa_get_vq_affinity, > .reset = vduse_vdpa_reset, > .set_map = vduse_vdpa_set_map, > + .set_group_asid = vduse_set_group_asid, > .get_vq_map = vduse_get_vq_map, > .free = vduse_vdpa_free, > }; > @@ -850,15 +905,13 @@ static void vduse_dev_sync_single_for_device(union > virtio_map token, > dma_addr_t dma_addr, size_t size, > enum dma_data_direction dir) > { > - struct vduse_dev *vdev; > struct vduse_iova_domain *domain; > > if (!token.group) > return; > > - vdev = token.group->dev; > - domain = vdev->domain; > - > + guard(vq_group_as_read_lock)(token.group); > + domain = token.group->as->domain; > vduse_domain_sync_single_for_device(domain, dma_addr, size, dir); > } > > @@ -866,15 +919,13 @@ static void vduse_dev_sync_single_for_cpu(union > virtio_map token, > dma_addr_t dma_addr, size_t size, > enum dma_data_direction dir) > { > - struct vduse_dev *vdev; > struct vduse_iova_domain *domain; > > if (!token.group) > return; > > - vdev = token.group->dev; > - domain = vdev->domain; > - > + guard(vq_group_as_read_lock)(token.group); > + domain = token.group->as->domain; > vduse_domain_sync_single_for_cpu(domain, dma_addr, size, dir); > } > > @@ -883,15 +934,13 @@ static dma_addr_t vduse_dev_map_page(union virtio_map > token, struct page *page, > enum dma_data_direction dir, > unsigned long attrs) > { > - struct vduse_dev *vdev; > struct vduse_iova_domain *domain; > > if (!token.group) > return DMA_MAPPING_ERROR; > > - vdev = token.group->dev; > - domain = vdev->domain; > - > + guard(vq_group_as_read_lock)(token.group); > + domain = token.group->as->domain; > return vduse_domain_map_page(domain, page, offset, size, dir, attrs); > } > > @@ -899,23 +948,19 @@ static void vduse_dev_unmap_page(union virtio_map > token, dma_addr_t dma_addr, > size_t size, enum dma_data_direction dir, > unsigned long attrs) > { > - struct vduse_dev *vdev; > struct vduse_iova_domain *domain; > > if (!token.group) > return; > > - vdev = token.group->dev; > - domain = vdev->domain; > - > - return vduse_domain_unmap_page(domain, dma_addr, size, dir, attrs); > + guard(vq_group_as_read_lock)(token.group); > + domain = token.group->as->domain; > + vduse_domain_unmap_page(domain, dma_addr, size, dir, attrs); > } > > static void *vduse_dev_alloc_coherent(union virtio_map token, size_t size, > dma_addr_t *dma_addr, gfp_t flag) > { > - struct vduse_dev *vdev; > - struct vduse_iova_domain *domain; > void *addr; > > *dma_addr = DMA_MAPPING_ERROR; > @@ -926,11 +971,15 @@ static void *vduse_dev_alloc_coherent(union virtio_map > token, size_t size, > if (!addr) > return NULL; > > - vdev = token.group->dev; > - domain = vdev->domain; > - *dma_addr = vduse_domain_alloc_coherent(domain, size, addr); > - if (*dma_addr == DMA_MAPPING_ERROR) > - goto err; > + { > + struct vduse_iova_domain *domain; > + > + guard(vq_group_as_read_lock)(token.group); > + domain = token.group->as->domain; > + *dma_addr = vduse_domain_alloc_coherent(domain, size, addr); > + if (*dma_addr == DMA_MAPPING_ERROR) > + goto err; > + }
Nit: Having a standalone block probably means we have the chance to optimize the code, maybe using a helper? Thanks

