> -----Original Message----- > From: Ananyev, Konstantin <konstantin.anan...@intel.com> > Sent: Tuesday, March 24, 2020 7:04 PM > To: Phil Yang <phil.y...@arm.com>; Honnappa Nagarahalli > <honnappa.nagaraha...@arm.com>; tho...@monjalon.net; Van Haaren, > Harry <harry.van.haa...@intel.com>; step...@networkplumber.org; > maxime.coque...@redhat.com; dev@dpdk.org; Richardson, Bruce > <bruce.richard...@intel.com> > Cc: david.march...@redhat.com; jer...@marvell.com; > hemant.agra...@nxp.com; Gavin Hu <gavin...@arm.com>; Ruifeng Wang > <ruifeng.w...@arm.com>; Joyce Kong <joyce.k...@arm.com>; nd > <n...@arm.com>; nd <n...@arm.com>; nd <n...@arm.com> > Subject: RE: [PATCH v3 06/12] ipsec: optimize with c11 atomic for sa > outbound sqn update > > > Hi Phil, > > > <snip> > > > > > > > > > Subject: RE: [PATCH v3 06/12] ipsec: optimize with c11 atomic for sa > > > outbound > > > > > sqn update > > > > > > > > > > Hi Phil, > > > > > > > > > > > > > > > > > For SA outbound packets, rte_atomic64_add_return is used to > > > generate > > > > > > SQN atomically. This introduced an unnecessary full barrier by > > > > > > calling > > > > > > the '__sync' builtin implemented rte_atomic_XX API on aarch64. > This > > > > > > patch optimized it with c11 atomic and eliminated the expensive > > > > > > barrier for aarch64. > > > > > > > > > > > > Signed-off-by: Phil Yang <phil.y...@arm.com> > > > > > > Reviewed-by: Ruifeng Wang <ruifeng.w...@arm.com> > > > > > > Reviewed-by: Gavin Hu <gavin...@arm.com> > > > > > > --- > > > > > > lib/librte_ipsec/ipsec_sqn.h | 3 ++- > > > > > > lib/librte_ipsec/sa.h | 2 +- > > > > > > 2 files changed, 3 insertions(+), 2 deletions(-) > > > > > > > > > > > > diff --git a/lib/librte_ipsec/ipsec_sqn.h > > > > > > b/lib/librte_ipsec/ipsec_sqn.h index 0c2f76a..e884af7 100644 > > > > > > --- a/lib/librte_ipsec/ipsec_sqn.h > > > > > > +++ b/lib/librte_ipsec/ipsec_sqn.h > > > > > > @@ -128,7 +128,8 @@ esn_outb_update_sqn(struct rte_ipsec_sa > *sa, > > > > > > uint32_t *num) > > > > > > > > > > > > n = *num; > > > > > > if (SQN_ATOMIC(sa)) > > > > > > - sqn = (uint64_t)rte_atomic64_add_return(&sa- > > > > > >sqn.outb.atom, n); > > > > > > + sqn = __atomic_add_fetch(&sa->sqn.outb.atom, n, > > > > > > + __ATOMIC_RELAXED); > > > > > > > > > > One generic thing to note: > > > > > clang for i686 in some cases will generate a proper function call for > > > > > 64- > bit > > > > > __atomic builtins (gcc seems to always generate cmpxchng8b for such > > > cases). > > > > > Does anyone consider it as a potential problem? > > > > > It probably not a big deal, but would like to know broader opinion. > > > > I had looked at this some time back for GCC. The function call is > generated > > > only if the underlying platform does not support the atomic > > > > instructions for the operand size. Otherwise, gcc generates the > instructions > > > directly. > > > > I would think the behavior would be the same for clang. > > > > > > From what I see not really. > > > As an example: > > > > > > $ cat tatm11.c > > > #include <stdint.h> > > > > > > struct x { > > > uint64_t v __attribute__((aligned(8))); > > > }; > > > > > > uint64_t > > > ffxadd1(struct x *x, uint32_t n, uint32_t m) > > > { > > > return __atomic_add_fetch(&x->v, n, __ATOMIC_RELAXED); > > > } > > > > > > uint64_t > > > ffxadd11(uint64_t *v, uint32_t n, uint32_t m) > > > { > > > return __atomic_add_fetch(v, n, __ATOMIC_RELAXED); > > > } > > > > > > gcc for i686 will generate code with cmpxchng8b for both cases. > > > clang will generate cmpxchng8b for ffxadd1() - when data is explicitly 8B > > > aligned, > > > but will emit a function call for ffxadd11(). > > > > I guess your testbed is an i386 platform. However, what I see here is > different. > > > > Testbed i686: Ubuntu 18.04.4 LTS/GCC 8.3/ Clang 9.0.0-2 > > Both Clang and GCC for i686 generate code with xadd for these two cases. > > I suppose you meant x86_64 here (-m64), right?
Yes. It is x86_64 here. > > > > > > Testbed i386: Ubuntu 16.04 LTS (Installed libatomic)/GCC 5.4.0/ Clang 4.0.0 > > GCC will generate code with cmpxchng8b for both cases. > > Clang generated code emits a function call for both cases. > > That's exactly what I am talking about above. > X86_64 (64 bit binary) - no function calls for both gcc and clang > i686 (32 bit binary) - no function calls with gcc, functions calls with clang > when explicit alignment is not specified. > > As I said in my initial email, that's probably not a big deal - > from what I was told so far we don't officially support clang for IA-32 > and I don't know does anyone uses it at all right now. > Though if someone thinks it is a potential problem here - > it is better to flag it at early stage. > So once again my questions to the community: > 1/ Does anyone builds/uses DPDK with i686-clang? > 2/ If there are anyone, can these persons try to evaluate > how big perf drop it would cause for them? > 3/ Is there an option to switch to i686-gcc (supported one)? > Konstantin > > > > > > > > > > > > > > > > > > > else { > > > > > > sqn = sa->sqn.outb.raw + n; > > > > > > sa->sqn.outb.raw = sqn; > > > > > > diff --git a/lib/librte_ipsec/sa.h b/lib/librte_ipsec/sa.h index > > > > > > d22451b..cab9a2e 100644 > > > > > > --- a/lib/librte_ipsec/sa.h > > > > > > +++ b/lib/librte_ipsec/sa.h > > > > > > @@ -120,7 +120,7 @@ struct rte_ipsec_sa { > > > > > > */ > > > > > > union { > > > > > > union { > > > > > > - rte_atomic64_t atom; > > > > > > + uint64_t atom; > > > > > > uint64_t raw; > > > > > > } outb; > > > > > > > > > > If we don't need rte_atomic64 here anymore, then I think we can > > > collapse the > > > > > union to just: > > > > > uint64_t outb; > > > > > > > > > > > struct { > > > > > > -- > > > > > > 2.7.4