> -----Original Message-----
> From: dev <dev-boun...@dpdk.org> On Behalf Of Ananyev, Konstantin
> Sent: Tuesday, March 24, 2020 1:10 PM
> To: Honnappa Nagarahalli <honnappa.nagaraha...@arm.com>; Phil Yang 
> <phil.y...@arm.com>; tho...@monjalon.net; Van Haaren,
> Harry <harry.van.haa...@intel.com>; step...@networkplumber.org; 
> maxime.coque...@redhat.com; dev@dpdk.org; Richardson, Bruce
> <bruce.richard...@intel.com>
> Cc: david.march...@redhat.com; jer...@marvell.com; hemant.agra...@nxp.com; 
> Gavin Hu <gavin...@arm.com>; Ruifeng Wang
> <ruifeng.w...@arm.com>; Joyce Kong <joyce.k...@arm.com>; nd <n...@arm.com>; 
> nd <n...@arm.com>
> Subject: Re: [dpdk-dev] [PATCH v3 06/12] ipsec: optimize with c11 atomic for 
> sa outbound sqn update
> 
> 
> > > > > > For SA outbound packets, rte_atomic64_add_return is used to
> > > > > > generate SQN atomically. This introduced an unnecessary full
> > > > > > barrier by calling the '__sync' builtin implemented rte_atomic_XX
> > > > > > API on aarch64. This patch optimized it with c11 atomic and
> > > > > > eliminated the expensive barrier for aarch64.
> > > > > >
> > > > > > Signed-off-by: Phil Yang <phil.y...@arm.com>
> > > > > > Reviewed-by: Ruifeng Wang <ruifeng.w...@arm.com>
> > > > > > Reviewed-by: Gavin Hu <gavin...@arm.com>
> > > > > > ---
> > > > > >  lib/librte_ipsec/ipsec_sqn.h | 3 ++-
> > > > > >  lib/librte_ipsec/sa.h        | 2 +-
> > > > > >  2 files changed, 3 insertions(+), 2 deletions(-)
> > > > > >
> > > > > > diff --git a/lib/librte_ipsec/ipsec_sqn.h
> > > > > > b/lib/librte_ipsec/ipsec_sqn.h index 0c2f76a..e884af7 100644
> > > > > > --- a/lib/librte_ipsec/ipsec_sqn.h
> > > > > > +++ b/lib/librte_ipsec/ipsec_sqn.h
> > > > > > @@ -128,7 +128,8 @@ esn_outb_update_sqn(struct rte_ipsec_sa *sa,
> > > > > > uint32_t *num)
> > > > > >
> > > > > >     n = *num;
> > > > > >     if (SQN_ATOMIC(sa))
> > > > > > -           sqn = (uint64_t)rte_atomic64_add_return(&sa-
> > > > > >sqn.outb.atom, n);
> > > > > > +           sqn = __atomic_add_fetch(&sa->sqn.outb.atom, n,
> > > > > > +                   __ATOMIC_RELAXED);
> > > > >
> > > > > One generic thing to note:
> > > > > clang for i686 in some cases will generate a proper function call
> > > > > for 64-bit __atomic builtins (gcc seems to always generate cmpxchng8b 
> > > > > for
> > > such cases).
> > > > > Does anyone consider it as a potential problem?
> > > > > It probably not a big deal, but would like to know broader opinion.
> > > > I had looked at this some time back for GCC. The function call is
> > > > generated only if the underlying platform does not support the atomic
> > > instructions for the operand size. Otherwise, gcc generates the 
> > > instructions
> > > directly.
> > > > I would think the behavior would be the same for clang.
> > >
> > > From what I see not really.
> > > As an example:
> > >
> > > $ cat tatm11.c
> > > #include <stdint.h>
> > >
> > > struct x {
> > >         uint64_t v __attribute__((aligned(8))); };
> > >
> > > uint64_t
> > > ffxadd1(struct x *x, uint32_t n, uint32_t m) {
> > >         return __atomic_add_fetch(&x->v, n, __ATOMIC_RELAXED); }
> > >
> > > uint64_t
> > > ffxadd11(uint64_t *v, uint32_t n, uint32_t m) {
> > >         return __atomic_add_fetch(v, n, __ATOMIC_RELAXED); }
> > >
> > > gcc for i686 will generate code with cmpxchng8b for both cases.
> > > clang will generate cmpxchng8b for ffxadd1() - when data is explicitly 8B
> > > aligned, but will emit a function call for ffxadd11().
> > Does it require libatomic to be linked in this case?
> 
> Yes, it does.
> In fact same story even with current dpdk.org master.
> To make i686-native-linuxapp-clang successfully, I have to
> explicitly add EXTRA_LDFLAGS="-latomic".
> To be more specific:
> $ for i in i686-native-linuxapp-clang/lib/*.a; do x=`nm $i | grep __atomic_`; 
> if [[ -n "${x}" ]]; then echo $i; echo $x; fi; done
> i686-native-linuxapp-clang/lib/librte_distributor.a
> U __atomic_load_8 U __atomic_store_8
> i686-native-linuxapp-clang/lib/librte_pmd_opdl_event.a
> U __atomic_load_8 U __atomic_store_8
> i686-native-linuxapp-clang/lib/librte_rcu.a
> U __atomic_compare_exchange_8 U __atomic_load_8
> 
> As there were no complains so far, it makes me think that
> probably no-one using clang for IA-32 builds.
> 
> > Clang documentation calls out unaligned case where it would generate the 
> > function call
> > [1].
> 
> Seems so, and it treats uin64_t as 4B aligned for IA.
correction: for IA-32

> 
> > On aarch64, the atomic instructions need the address to be aligned.
> 
> For that particular case (cmpxchng8b) there is no such restrictions for IA-32.
> Again, as I said before, gcc manages to emit code without function calls
> for exactly the same source.
> 
> >
> > [1] https://clang.llvm.org/docs/Toolchain.html#atomics-library
> >
> > >
> > > >
> > > > >
> > > > > >     else {
> > > > > >             sqn = sa->sqn.outb.raw + n;
> > > > > >             sa->sqn.outb.raw = sqn;
> > > > > > diff --git a/lib/librte_ipsec/sa.h b/lib/librte_ipsec/sa.h index
> > > > > > d22451b..cab9a2e 100644
> > > > > > --- a/lib/librte_ipsec/sa.h
> > > > > > +++ b/lib/librte_ipsec/sa.h
> > > > > > @@ -120,7 +120,7 @@ struct rte_ipsec_sa {
> > > > > >      */
> > > > > >     union {
> > > > > >             union {
> > > > > > -                   rte_atomic64_t atom;
> > > > > > +                   uint64_t atom;
> > > > > >                     uint64_t raw;
> > > > > >             } outb;
> > > > >
> > > > > If we don't need rte_atomic64 here anymore, then I think we can
> > > > > collapse the union to just:
> > > > > uint64_t outb;
> > > > >
> > > > > >             struct {
> > > > > > --
> > > > > > 2.7.4

Reply via email to