Thanks for the feedback. My apologies for delayed reply. > -----Original Message----- > From: Konstantin Ananyev <konstantin.v.anan...@yandex.ru> > Sent: Sunday, August 7, 2022 5:56 PM > To: Honnappa Nagarahalli <honnappa.nagaraha...@arm.com>; Amit Prakash > Shukla <amitpraka...@marvell.com> > Cc: dev@dpdk.org; Jerin Jacob Kollanukkaran <jer...@marvell.com>; > sta...@dpdk.org; nd <n...@arm.com> > Subject: [EXT] Re: [PATCH] ring: compilation fix with GCC-12 > > External Email > > ---------------------------------------------------------------------- > 06/08/2022 19:35, Honnappa Nagarahalli пишет: > > <snip> > > > >> > >> GCC 12 raises the following warning: > >> > >> In function '__rte_ring_dequeue_elems_128', > >> inlined from '__rte_ring_dequeue_elems' at > >> ../lib/ring/rte_ring_elem_pvt.h:262:3, > >> inlined from '__rte_ring_do_hts_dequeue_elem' at > >> ../lib/ring/rte_ring_hts_elem_pvt.h:237:3, > >> inlined from 'rte_ring_mc_hts_dequeue_bulk_elem' at > >> ../lib/ring/rte_ring_hts.h:83:9, > >> inlined from 'rte_ring_dequeue_bulk_elem' at > >> ../lib/ring/rte_ring_elem.h:391:10, > >> inlined from 'rte_ring_dequeue_elem' at > >> ../lib/ring/rte_ring_elem.h:476:9, > >> inlined from 'rte_ring_dequeue' at > >> ../lib/ring/rte_ring.h:463:9, > >> inlined from 'rxa_intr_ring_dequeue' at > >> ../lib/eventdev/rte_event_eth_rx_adapter.c:1196:10: > >> ../lib/ring/rte_ring_elem_pvt.h:234:25: error: 'memcpy' writing > >> 32 bytes into a region of size 8 overflows the destination > >> [-Werror=stringop-overflow=] > >> 234 | memcpy((void *)(obj + i), (void *)(ring + idx), 32); > >> | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > >> > >> Replacing memcpy with rte_memcpy fixes the GCC-12 compilation issue. > > Any reason why this replacement fixes the problem? > > Do you have any performance numbers with this change? > > > >> Also it would be better to change to rte_memcpy as the function is > >> called in fastpath. > > On Arm platforms, memcpy in the later versions has the best performance. > > I agree with Honnappa, it is better to keep memcpy() here. > Actually what is strange - why it ends up in > __rte_ring_dequeue_elems_128() at all? > Inside rxa_intr_ring_dequeue() we clearly doing: rte_ring_dequeue(), which > should boil down to ___rte_ring_dequeue_elems_64(). > it should go to __rte_ring_dequeue_elems_128() at all.
I agree. After having close look and doing few experiments, ideally it should not be going to __rte_ring_dequeue_elems_128(). Sizeof(in call of rte_ring_enqueue_elem) gets evaluated at compile time which in this case it is evaluated to 8 bytes so __rte_ring_dequeue_elems_128() shall not be in the path. Looks like more of a gcc-12 bug.? > > Another q - is this warning happens only on arm platforms? Warning is observed on x86 with build type as debug. "meson --werror --buildtype=debug build" > > > > >> > >> Bugzilla ID: 1062 > >> Fixes: 1fc73390bcf5 ("ring: refactor exported headers") > >> Cc: sta...@dpdk.org > >> > >> Signed-off-by: Amit Prakash Shukla <amitpraka...@marvell.com> > >> --- > >> lib/ring/rte_ring_elem_pvt.h | 18 ++++++++++-------- > >> 1 file changed, 10 insertions(+), 8 deletions(-) > >> > >> diff --git a/lib/ring/rte_ring_elem_pvt.h > >> b/lib/ring/rte_ring_elem_pvt.h index > >> 83788c56e6..3d85b13333 100644 > >> --- a/lib/ring/rte_ring_elem_pvt.h > >> +++ b/lib/ring/rte_ring_elem_pvt.h > >> @@ -10,6 +10,8 @@ > >> #ifndef _RTE_RING_ELEM_PVT_H_ > >> #define _RTE_RING_ELEM_PVT_H_ > >> > >> +#include <rte_memcpy.h> > >> + > >> static __rte_always_inline void > >> __rte_ring_enqueue_elems_32(struct rte_ring *r, const uint32_t size, > >> uint32_t idx, const void *obj_table, uint32_t n) @@ -97,20 > >> +99,20 @@ __rte_ring_enqueue_elems_128(struct rte_ring *r, uint32_t > >> prod_head, > >> const rte_int128_t *obj = (const rte_int128_t *)obj_table; > >> if (likely(idx + n <= size)) { > >> for (i = 0; i < (n & ~0x1); i += 2, idx += 2) > >> - memcpy((void *)(ring + idx), > >> + rte_memcpy((void *)(ring + idx), > >> (const void *)(obj + i), 32); > >> switch (n & 0x1) { > >> case 1: > >> - memcpy((void *)(ring + idx), > >> + rte_memcpy((void *)(ring + idx), > >> (const void *)(obj + i), 16); > >> } > >> } else { > >> for (i = 0; idx < size; i++, idx++) > >> - memcpy((void *)(ring + idx), > >> + rte_memcpy((void *)(ring + idx), > >> (const void *)(obj + i), 16); > >> /* Start at the beginning */ > >> for (idx = 0; i < n; i++, idx++) > >> - memcpy((void *)(ring + idx), > >> + rte_memcpy((void *)(ring + idx), > >> (const void *)(obj + i), 16); > >> } > >> } > >> @@ -231,17 +233,17 @@ __rte_ring_dequeue_elems_128(struct rte_ring > >> *r, uint32_t prod_head, > >> rte_int128_t *obj = (rte_int128_t *)obj_table; > >> if (likely(idx + n <= size)) { > >> for (i = 0; i < (n & ~0x1); i += 2, idx += 2) > >> - memcpy((void *)(obj + i), (void *)(ring + idx), 32); > >> + rte_memcpy((void *)(obj + i), (void *)(ring + idx), > 32); > >> switch (n & 0x1) { > >> case 1: > >> - memcpy((void *)(obj + i), (void *)(ring + idx), 16); > >> + rte_memcpy((void *)(obj + i), (void *)(ring + idx), > 16); > >> } > >> } else { > >> for (i = 0; idx < size; i++, idx++) > >> - memcpy((void *)(obj + i), (void *)(ring + idx), 16); > >> + rte_memcpy((void *)(obj + i), (void *)(ring + idx), > 16); > >> /* Start at the beginning */ > >> for (idx = 0; i < n; i++, idx++) > >> - memcpy((void *)(obj + i), (void *)(ring + idx), 16); > >> + rte_memcpy((void *)(obj + i), (void *)(ring + idx), > 16); > >> } > >> } > >> > >> -- > >> 2.25.1 > >