-----Original Message-----
From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Maxime Coquelin
Sent: Thursday, November 3, 2016 4:11 PM
To: Wang, Zhihong <zhihong.wang at intel.com>; Yuanhan Liu <yuanhan.liu at 
linux.intel.com>
Cc: mst at redhat.com; dev at dpdk.org; vkaplans at redhat.com
Subject: Re: [dpdk-dev] [PATCH v4] vhost: Add indirect descriptors support to 
the TX path


>
> The strange thing with both of our figures is that this is below from 
> what I obtain with my SandyBridge machine. The SB cpu freq is 4% 
> higher, but that doesn't explain the gap between the measurements.
>
> I'm continuing the investigations on my side.
> Maybe we should fix a deadline, and decide do disable indirect in 
> Virtio PMD if root cause not identified/fixed at some point?
>
> Yuanhan, what do you think?

I have done some measurements using perf, and know understand better what 
happens.

With indirect descriptors, I can see a cache miss when fetching the descriptors 
in the indirect table. Actually, this is expected, so we prefetch the first 
desc as soon as possible, but still not soon enough to make it transparent.
In direct descriptors case, the desc in the virtqueue seems to be remain in the 
cache from its previous use, so we have a hit.

That said, in realistic use-case, I think we should not have a hit, even with 
direct descriptors.
Indeed, the test case use testpmd on guest side with the forwarding set in IO 
mode. It means the packet content is never accessed by the guest.

In my experiments, I am used to set the "macswap" forwarding mode, which swaps 
src and dest MAC addresses in the packet. I find it more realistic, because I 
don't see the point in sending packets to the guest if it is not accessed (not 
even its header).

I tried again the test case, this time with setting the forwarding mode to 
macswap in the guest. This time, I get same performance with both direct and 
indirect (indirect even a little better with a small optimization, consisting 
in prefetching the 2 first descs systematically as we know there are 
contiguous).

Do you agree we should assume that the packet (header or/and buf) will always 
be accessed by the guest application?
----Maybe it's true in many real use case. But we also need ensure the 
performance for "io fwd" has no performance drop. As I know, OVS-DPDK team will 
do the performance benchmark based on "IO fwd" for virtio part, so they will 
also see some performance drop. And we just thought if it's possible to make 
the feature default off then if someone wanted to use it can turn it on. People 
can choose if they want to use the feature, just like vhost dequeue zero copy. 

If so, do you agree we should keep indirect descs enabled, and maybe update the 
test cases?

Thanks,
Maxime

Reply via email to