Ping. Please could you help to review this patch? Thanks. Ruifeng
> -----Original Message----- > From: Ruifeng Wang <ruifeng.w...@arm.com> > Sent: Tuesday, January 4, 2022 11:01 AM > To: ma...@nvidia.com; viachesl...@nvidia.com > Cc: dev@dpdk.org; Honnappa Nagarahalli > <honnappa.nagaraha...@arm.com>; sta...@dpdk.org; nd <n...@arm.com>; > Ruifeng Wang <ruifeng.w...@arm.com> > Subject: [PATCH] net/mlx5: fix risk in Rx descriptor read in NEON vector path > > In NEON vector PMD, vector load loads two contiguous 8B of descriptor data > into vector register. Given vector load ensures no 16B atomicity, read of the > word that includes op_own field could be reordered after read of other > words. In this case, some words could contain invalid data. > > Reloaded qword0 after read barrier to update vector register. This ensures > that the fetched data is correct. > > Testpmd single core test on N1SDP/ThunderX2 showed no performance > drop. > > Fixes: 1742c2d9fab0 ("net/mlx5: fix synchronization on polling Rx > completions") > Cc: sta...@dpdk.org > > Signed-off-by: Ruifeng Wang <ruifeng.w...@arm.com> > --- > drivers/net/mlx5/mlx5_rxtx_vec_neon.h | 8 ++++++++ > 1 file changed, 8 insertions(+) > > diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_neon.h > b/drivers/net/mlx5/mlx5_rxtx_vec_neon.h > index b1d16baa61..b1ec615b51 100644 > --- a/drivers/net/mlx5/mlx5_rxtx_vec_neon.h > +++ b/drivers/net/mlx5/mlx5_rxtx_vec_neon.h > @@ -647,6 +647,14 @@ rxq_cq_process_v(struct mlx5_rxq_data *rxq, > volatile struct mlx5_cqe *cq, > c0 = vld1q_u64((uint64_t *)(p0 + 48)); > /* Synchronize for loading the rest of blocks. */ > rte_io_rmb(); > + /* B.0 (CQE 3) reload lower half of the block. */ > + c3 = vld1q_lane_u64((uint64_t *)(p3 + 48), c3, 0); > + /* B.0 (CQE 2) reload lower half of the block. */ > + c2 = vld1q_lane_u64((uint64_t *)(p2 + 48), c2, 0); > + /* B.0 (CQE 1) reload lower half of the block. */ > + c1 = vld1q_lane_u64((uint64_t *)(p1 + 48), c1, 0); > + /* B.0 (CQE 0) reload lower half of the block. */ > + c0 = vld1q_lane_u64((uint64_t *)(p0 + 48), c0, 0); > /* Prefetch next 4 CQEs. */ > if (pkts_n - pos >= 2 * MLX5_VPMD_DESCS_PER_LOOP) { > unsigned int next = pos + > MLX5_VPMD_DESCS_PER_LOOP; > -- > 2.25.1