On Fri, Sep 21, 2018 at 08:32:22PM +0800, Tiwei Bie wrote:
On Fri, Sep 21, 2018 at 12:32:57PM +0200, Jens Freimann wrote:
This is a basic implementation of packed virtqueues as specified in the
Virtio 1.1 draft. A compiled version of the current draft is available
at https://github.com/oasis-tcs/virtio-docs.git (or as .pdf at
https://github.com/oasis-tcs/virtio-docs/blob/master/virtio-v1.1-packed-wd10.pdf
A packed virtqueue is different from a split virtqueue in that it
consists of only a single descriptor ring that replaces available and
used ring, index and descriptor buffer.
Each descriptor is readable and writable and has a flags field. These flags
will mark if a descriptor is available or used. To detect new available
descriptors
even after the ring has wrapped, device and driver each have a
single-bit wrap counter that is flipped from 0 to 1 and vice versa every time
the last descriptor in the ring is used/made available.
The idea behind this is to 1. improve performance by avoiding cache misses
and 2. be easier for devices to implement.
Regarding performance: with these patches I get 21.13 Mpps on my system
as compared to 18.8 Mpps with the virtio 1.0 code. Packet size was 64
Did you enable multiple-queue and use multiple cores on
vhost side? If not, I guess the above performance gain
is the gain in vhost side instead of virtio side.
I tested several variations back then and they all looked very good.
But code change a lot meanwhile and I need to do more benchmarking
in any case.
If you use more cores on vhost side or virtio side, will
you see any performance changes?
Did you do any performance test with the kernel vhost-net
backend (with zero-copy enabled and disabled)? I think we
also need some performance data for these two cases. And
it can help us to make sure that it works with the kernel
backends.
I tested against vhost-kernel but only to test functionality not
to benchmark.
And for the "virtio-PMD + vhost-PMD" test cases, I think
we need below performance data:
#1. The maximum 1 core performance of virtio PMD when using split ring.
#2. The maximum 1 core performance of virtio PMD when using packed ring.
#3. The maximum 1 core performance of vhost PMD when using split ring.
#4. The maximum 1 core performance of vhost PMD when using packed ring.
And then we can have a clear understanding of the
performance gain in DPDK with packed ring.
And FYI, the maximum 1 core performance of virtio PMD
can be got in below steps:
1. Launch vhost-PMD with multiple queues, and use multiple
CPU cores for forwarding.
2. Launch virtio-PMD with multiple queues and use 1 CPU
core for forwarding.
3. Repeat above two steps with adding more CPU cores
for forwarding in vhost-PMD side until we can't see
performance increase anymore.
Thanks for the suggestions, I'll come back with more
numbers.
Besides, I just did a quick glance at the Tx implementation,
it still assumes the descs will be written back in order
by device. You can find more details from my comments on
that patch.
Saw it and noted. I had hoped to be able to avoid the list but
I see no way around it now.
Thanks for your review Tiwei!
regards,
Jens