On Mon, Jul 28, 2025 at 6:17 PM Michael S. Tsirkin <m...@redhat.com> wrote: > > On Mon, Jul 28, 2025 at 02:41:29PM +0800, Jason Wang wrote: > > This patch implements in order support for both split virtqueue and > > packed virtqueue. Perfomance could be gained for the device where the > > memory access could be expensive (e.g vhost-net or a real PCI device): > > > > Benchmark with KVM guest: > > > > Vhost-net on the host: (pktgen + XDP_DROP): > > > > in_order=off | in_order=on | +% > > TX: 5.20Mpps | 6.20Mpps | +19% > > RX: 3.47Mpps | 3.61Mpps | + 4% > > > > Vhost-user(testpmd) on the host: (pktgen/XDP_DROP): > > > > For split virtqueue: > > > > in_order=off | in_order=on | +% > > TX: 5.60Mpps | 5.60Mpps | +0.0% > > RX: 9.16Mpps | 9.61Mpps | +4.9% > > > > For packed virtqueue: > > > > in_order=off | in_order=on | +% > > TX: 5.60Mpps | 5.70Mpps | +1.7% > > RX: 10.6Mpps | 10.8Mpps | +1.8% > > > > Benchmark also shows no performance impact for in_order=off for queue > > size with 256 and 1024. > > > > Signed-off-by: Jason Wang <jasow...@redhat.com> > > Signed-off-by: Michael S. Tsirkin <m...@redhat.com> > > --- > > Changes since V4: > > - Fix build error when DEBUG is enabled > > - Fix function duplications > > - Remove unnecessary new lines > > --- > > drivers/virtio/virtio_ring.c | 421 +++++++++++++++++++++++++++++++++-- > > 1 file changed, 401 insertions(+), 20 deletions(-) > > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c > > index e675d8305dbf..c6558e271f97 100644 > > --- a/drivers/virtio/virtio_ring.c > > +++ b/drivers/virtio/virtio_ring.c > > @@ -70,11 +70,14 @@ > > enum vq_layout { > > SPLIT = 0, > > PACKED, > > + SPLIT_IN_ORDER, > > + PACKED_IN_ORDER, > > VQ_TYPE_MAX, > > }; > > > how about specifying the #s here? > SPLIT = 0, > PACKED = 1, > IN_ORDER = 2, > SPLIT_IN_ORDER = 2, > PACKED_IN_ORDER = 3, > VQ_TYPE_MAX, > > and then > > static inline bool virtqueue_is_packed(const struct vring_virtqueue *vq) > { > return vq->layout & PACKED; > } > > static inline bool virtqueue_is_in_order(const struct vring_virtqueue *vq) > { > return vq->layout & IN_ORDER; > } > > which is a tiny bit less code. > > worth doing?
Probably not, for example it would introduce branches. As we discussed, once we have sufficient optimizations, most of the branches could be saved. > > > > > struct vring_desc_state_split { > > void *data; /* Data for callback. */ > > + u32 total_len; /* Buffer Length */ > > > > /* Indirect desc table and extra table, if any. These two will be > > * allocated together. So we won't stress more to the memory > > allocator. > > > this is only used for in_order, and it increases the struct size > by half due to padding. why not a separate struct? > Or if you like, reuse vring_desc_state_packed - it is same > size with this addition. > > > > @@ -84,6 +87,7 @@ struct vring_desc_state_split { > > > > struct vring_desc_state_packed { > > void *data; /* Data for callback. */ > > + u32 total_len; /* Buffer Length */ > > > > /* Indirect desc table and extra table, if any. These two will be > > * allocated together. So we won't stress more to the memory > > allocator. > > there's empty space at the end of this struct. > struct vring_desc_state_packed { > void *data; /* Data for callback. */ > u32 total_len; /* Buffer Length */ > > /* Indirect desc table and extra table, if any. These two will be > * allocated together. So we won't stress more to the memory > allocator. > */ > struct vring_packed_desc *indir_desc; > u16 num; /* Descriptor list length. */ > u16 last; /* The last desc state in a list. */ > }; > > why not put it there? > Fine. Thanks