Hi, > -----Original Message----- > From: Guduri Prathyusha [mailto:gprathyu...@caviumnetworks.com] > Sent: Thursday, November 2, 2017 2:31 PM > To: Kantecki, Tomasz <tomasz.kante...@intel.com> > Cc: jianbo....@arm.com; guduriprathyu...@gmail.com; Ananyev, Konstantin > <konstantin.anan...@intel.com>; dev@dpdk.org; Guduri > Prathyusha <gprathyu...@caviumnetworks.com> > Subject: [dpdk-dev] [PATCH ] examples/l3fwd: fix aliasing in port grouping > > With -f-strict-aliasing enabled by default from -O2, gcc > 5.x gives > undefined behavior in port_groupx4. 'pn' and 'pnum' are two different > pointers pointing to same chunk of memory and with -f-strict-aliasing the > pointers are assumed to be pointing to different memory and compiler > reorders instructions that depend on pnum and pn. This breaks port > grouping algorithm. > > This patch eliminates the usage of union and uses memcpy for copying > gptbl[v].pnum to pn. memcpy when applied on built_in constant size does > not call its library implementation but uses appropriate LD and ST > instructions directly and hence no performance overhead. > > Fixes: 569b290cdb36 ("examples/l3fwd: add NEON implementation") > Fixes: af1694d94bf1 ("examples/l3fwd: fix crash with gcc 5") > Signed-off-by: Guduri Prathyusha <gprathyu...@caviumnetworks.com> > --- > examples/l3fwd/l3fwd_neon.h | 11 +++-------- > examples/l3fwd/l3fwd_sse.h | 11 +++-------- > 2 files changed, 6 insertions(+), 16 deletions(-) > > diff --git a/examples/l3fwd/l3fwd_neon.h b/examples/l3fwd/l3fwd_neon.h > index 4bc161394..10a602a04 100644 > --- a/examples/l3fwd/l3fwd_neon.h > +++ b/examples/l3fwd/l3fwd_neon.h > @@ -100,11 +100,6 @@ static inline uint16_t * > port_groupx4(uint16_t pn[FWDSTEP + 1], uint16_t *lp, uint16x8_t dp1, > uint16x8_t dp2) > { > - union { > - uint16_t u16[FWDSTEP + 1]; > - uint64_t u64; > - } *pnum = (void *)pn; > - > int32_t v; > uint16x8_t mask = {1, 2, 4, 8, 0, 0, 0, 0}; > > @@ -117,9 +112,9 @@ port_groupx4(uint16_t pn[FWDSTEP + 1], uint16_t *lp, > uint16x8_t dp1, > > /* if dest port value has changed. */ > if (v != GRPMSK) { > - pnum->u64 = gptbl[v].pnum; > - pnum->u16[FWDSTEP] = 1; > - lp = pnum->u16 + gptbl[v].idx; > + rte_memcpy(pn, &gptbl[v].pnum, sizeof(gptbl[v].pnum)); > + pn[FWDSTEP] = 1; > + lp = pn + gptbl[v].idx; > } > > return lp; > diff --git a/examples/l3fwd/l3fwd_sse.h b/examples/l3fwd/l3fwd_sse.h > index 831760f02..79a71d77e 100644 > --- a/examples/l3fwd/l3fwd_sse.h > +++ b/examples/l3fwd/l3fwd_sse.h > @@ -98,11 +98,6 @@ processx4_step3(struct rte_mbuf *pkt[FWDSTEP], uint16_t > dst_port[FWDSTEP]) > static inline uint16_t * > port_groupx4(uint16_t pn[FWDSTEP + 1], uint16_t *lp, __m128i dp1, __m128i > dp2) > { > - union { > - uint16_t u16[FWDSTEP + 1]; > - uint64_t u64; > - } *pnum = (void *)pn; > - > int32_t v; > > dp1 = _mm_cmpeq_epi16(dp1, dp2); > @@ -114,9 +109,9 @@ port_groupx4(uint16_t pn[FWDSTEP + 1], uint16_t *lp, > __m128i dp1, __m128i dp2) > > /* if dest port value has changed. */ > if (v != GRPMSK) { > - pnum->u64 = gptbl[v].pnum; > - pnum->u16[FWDSTEP] = 1; > - lp = pnum->u16 + gptbl[v].idx; > + rte_memcpy(pn, &gptbl[v].pnum, sizeof(gptbl[v].pnum)); > + pn[FWDSTEP] = 1; > + lp = pn + gptbl[v].idx;
Could you explain a bit more here - which exactly instructions were reordered and what kind of problems did it cause? Specially on IA? In any case I don't think using rte_memcpy is a good thing to use here: it is a huge inline function - way too much to copy just 64 bit variable. Konstantin > } > > return lp; > -- > 2.14.1