Thanks Yinan for reporting the regresion and Gavin for the analysis. On 9/10/19 11:48 AM, Gavin Hu (Arm Technology China) wrote: > Hi Yinan, > > We have done a comparative analysis and found with the old code the > if(weak_barriers) and else branches were saved on x86 as rte_smp_wmb and > rte_cio_wmb are identical. > http://git.dpdk.org/dpdk/tree/drivers/net/virtio/virtqueue.h#n49 > For the new code, with Joyce's patches applied, the branches were not saved, > which requir additional cpu cycles, this caused slight degradation on x86. > > The patches uplifted the performance on aarch64 about 9% as indicated in the > cover letter. While I am thinking over a solution to the degradation on > x86,could you help answer: > 1. Is rte_cio_wmb is sufficient for the non weak-barrier case(HW offloading)? > I got this question because I see in Intel NIC PMDs, it is almost never > used, it is rte_wmb that is more widely used to notify the NIC device, any > difference between the virtio ring compatible smartNIC device(or vDPA?) and > i40e like devices? > 2. If the rte_cio_wmb is not sufficient for this case and replaced by > stronger barriers, like sfence, then the branches will not be saved by the > compiler, then the problem becomes with the correct use of barriers, other > than the degradation. > > Any comments are welcome!
It may we worth that Yinan tries with rte_wmb instead of rte_cio_wmb without the series applied, just to confirm this is caused by the etra branch. Maxime > Best Regards, > Gavin > >> -----Original Message----- >> From: Wang, Yinan <[email protected]> >> Sent: Tuesday, September 10, 2019 11:54 AM >> To: Maxime Coquelin <[email protected]>; Joyce Kong (Arm >> Technology China) <[email protected]>; [email protected] >> Cc: nd <[email protected]>; Bie, Tiwei <[email protected]>; Wang, Zhihong >> <[email protected]>; [email protected]; Wang, Xiao W >> <[email protected]>; Liu, Yong <[email protected]>; >> [email protected]; Honnappa Nagarahalli >> <[email protected]>; Gavin Hu (Arm Technology China) >> <[email protected]> >> Subject: RE: [dpdk-dev] [PATCH v3 1/2] virtio: one way barrier for packed >> vring >> desc avail flags >> >> >> Hi Joyce, >> >> I just test performance impact of your patch set with code base commit id: >> d03d8622db48918d14bfe805641b1766ecc40088, after applying your v3 patch >> set , seven paths of vhost/virtio pvp test shows performance drop as below: >> >> PVP vhost/virtio 1c1q test before apply patch apply patch >> test_perf_pvp_inorder_mergeable 7.603 7.474 >> test_perf_pvp_inorder_no_mergeable 7.642 7.525 >> test_perf_pvp_mergeable 7.556 7.431 >> test_perf_pvp_normal 7.554 7.478 >> test_perf_pvp_vector_rx 7.581 7.469 >> test_perf_pvp_virtio11_mergeable 7.068 6.905 >> test_perf_pvp_virtio11_normal 7.088 6.888 >> >> Thanks, >> Yinan >> >>> -----Original Message----- >>> From: dev [mailto:[email protected]] On Behalf Of Maxime Coquelin >>> Sent: 2019年9月9日 18:10 >>> To: Joyce Kong <[email protected]>; [email protected] >>> Cc: [email protected]; Bie, Tiwei <[email protected]>; Wang, Zhihong >>> <[email protected]>; [email protected]; Wang, Xiao W >>> <[email protected]>; Liu, Yong <[email protected]>; >>> [email protected]; [email protected]; >> [email protected] >>> Subject: Re: [dpdk-dev] [PATCH v3 1/2] virtio: one way barrier for packed >> vring >>> desc avail flags >>> >>> >>> >>> On 9/9/19 11:14 AM, Joyce Kong wrote: >>>> In case VIRTIO_F_ORDER_PLATFORM(36) is not negotiated, then the >>>> frontend and backend are assumed to be implemented in software, that >>>> is they can run on identical CPUs in an SMP configuration. >>>> Thus a weak form of memory barriers like rte_smp_r/wmb, other than >>>> rte_cio_r/wmb, is sufficient for this case(vq->hw->weak_barriers == 1) >>>> and yields better performance. >>>> For the above case, this patch helps yielding even better performance >>>> by replacing the two-way barriers with C11 one-way barriers for avail >>>> flags in packed ring. >>>> >>>> Meanwhile, a read barrier is required to ensure ordering between >>>> descriptor's flags and content reads[1]. With C11, load-acquire can >>>> enforce the ordering instead of rmb barrier. >>>> >>>> [1]https://patchwork.dpdk.org/patch/49109/ >>>> >>>> Signed-off-by: Joyce Kong <[email protected]> >>>> Reviewed-by: Gavin Hu <[email protected]> >>>> Reviewed-by: Phil Yang <[email protected]> >>>> --- >>>> drivers/net/virtio/virtio_rxtx.c | 13 +++++++------ >>>> drivers/net/virtio/virtio_user/virtio_user_dev.c | 6 +++++- >>>> drivers/net/virtio/virtqueue.h | 11 +++++++++++ >>>> lib/librte_vhost/vhost.h | 2 +- >>>> lib/librte_vhost/virtio_net.c | 11 +++++------ >>>> 5 files changed, 29 insertions(+), 14 deletions(-) >>> >>> Reviewed-by: Maxime Coquelin <[email protected]> >>> >>> Thanks, >>> Maxime

