Thanks Yinan for reporting the regresion and Gavin for the analysis.

On 9/10/19 11:48 AM, Gavin Hu (Arm Technology China) wrote:
> Hi Yinan,
> 
> We have done a comparative analysis and found with the old code the 
> if(weak_barriers) and else branches were saved on x86 as rte_smp_wmb and 
> rte_cio_wmb are identical.  
> http://git.dpdk.org/dpdk/tree/drivers/net/virtio/virtqueue.h#n49 
> For the new code, with Joyce's patches applied, the branches were not saved, 
> which requir additional cpu cycles, this caused slight degradation on x86.
> 
> The patches uplifted the performance on aarch64 about 9% as indicated in the 
> cover letter. While I am thinking over a solution to the degradation on 
> x86,could you help answer:
> 1. Is rte_cio_wmb is sufficient for the non weak-barrier case(HW offloading)?
>  I got this question because I see in Intel NIC PMDs, it is almost never 
> used, it is rte_wmb that is more widely used to notify the NIC device, any 
> difference between the virtio ring compatible smartNIC device(or vDPA?) and 
> i40e like devices? 
> 2. If the rte_cio_wmb is not sufficient for this case and replaced by 
> stronger barriers, like sfence,  then the branches will not be saved by the 
> compiler, then the problem becomes with the correct use of barriers, other 
> than the degradation.
> 
> Any comments are welcome!

It may we worth that Yinan tries with rte_wmb instead of rte_cio_wmb
without the series applied, just to confirm this is caused by the etra
branch.

Maxime

> Best Regards,
> Gavin
> 
>> -----Original Message-----
>> From: Wang, Yinan <[email protected]>
>> Sent: Tuesday, September 10, 2019 11:54 AM
>> To: Maxime Coquelin <[email protected]>; Joyce Kong (Arm
>> Technology China) <[email protected]>; [email protected]
>> Cc: nd <[email protected]>; Bie, Tiwei <[email protected]>; Wang, Zhihong
>> <[email protected]>; [email protected]; Wang, Xiao W
>> <[email protected]>; Liu, Yong <[email protected]>;
>> [email protected]; Honnappa Nagarahalli
>> <[email protected]>; Gavin Hu (Arm Technology China)
>> <[email protected]>
>> Subject: RE: [dpdk-dev] [PATCH v3 1/2] virtio: one way barrier for packed 
>> vring
>> desc avail flags
>>
>>
>> Hi Joyce,
>>
>> I just test performance impact of your patch set with code base commit id:
>> d03d8622db48918d14bfe805641b1766ecc40088, after applying your v3 patch
>> set , seven paths of vhost/virtio pvp test shows performance drop as below:
>>
>> PVP vhost/virtio 1c1q test            before apply patch     apply patch
>> test_perf_pvp_inorder_mergeable       7.603             7.474
>> test_perf_pvp_inorder_no_mergeable        7.642                 7.525
>> test_perf_pvp_mergeable                    7.556                7.431
>> test_perf_pvp_normal                    7.554                   7.478
>> test_perf_pvp_vector_rx                     7.581               7.469
>> test_perf_pvp_virtio11_mergeable                7.068                   6.905
>> test_perf_pvp_virtio11_normal                   7.088                   6.888
>>
>> Thanks,
>> Yinan
>>
>>> -----Original Message-----
>>> From: dev [mailto:[email protected]] On Behalf Of Maxime Coquelin
>>> Sent: 2019年9月9日 18:10
>>> To: Joyce Kong <[email protected]>; [email protected]
>>> Cc: [email protected]; Bie, Tiwei <[email protected]>; Wang, Zhihong
>>> <[email protected]>; [email protected]; Wang, Xiao W
>>> <[email protected]>; Liu, Yong <[email protected]>;
>>> [email protected]; [email protected];
>> [email protected]
>>> Subject: Re: [dpdk-dev] [PATCH v3 1/2] virtio: one way barrier for packed
>> vring
>>> desc avail flags
>>>
>>>
>>>
>>> On 9/9/19 11:14 AM, Joyce Kong wrote:
>>>> In case VIRTIO_F_ORDER_PLATFORM(36) is not negotiated, then the
>>>> frontend and backend are assumed to be implemented in software, that
>>>> is they can run on identical CPUs in an SMP configuration.
>>>> Thus a weak form of memory barriers like rte_smp_r/wmb, other than
>>>> rte_cio_r/wmb, is sufficient for this case(vq->hw->weak_barriers == 1)
>>>> and yields better performance.
>>>> For the above case, this patch helps yielding even better performance
>>>> by replacing the two-way barriers with C11 one-way barriers for avail
>>>> flags in packed ring.
>>>>
>>>> Meanwhile, a read barrier is required to ensure ordering between
>>>> descriptor's flags and content reads[1]. With C11, load-acquire can
>>>> enforce the ordering instead of rmb barrier.
>>>>
>>>> [1]https://patchwork.dpdk.org/patch/49109/
>>>>
>>>> Signed-off-by: Joyce Kong <[email protected]>
>>>> Reviewed-by: Gavin Hu <[email protected]>
>>>> Reviewed-by: Phil Yang <[email protected]>
>>>> ---
>>>>  drivers/net/virtio/virtio_rxtx.c                 | 13 +++++++------
>>>>  drivers/net/virtio/virtio_user/virtio_user_dev.c |  6 +++++-
>>>>  drivers/net/virtio/virtqueue.h                   | 11 +++++++++++
>>>>  lib/librte_vhost/vhost.h                         |  2 +-
>>>>  lib/librte_vhost/virtio_net.c                    | 11 +++++------
>>>>  5 files changed, 29 insertions(+), 14 deletions(-)
>>>
>>> Reviewed-by: Maxime Coquelin <[email protected]>
>>>
>>> Thanks,
>>> Maxime

Reply via email to