On Wed, May 22, 2024 at 07:57:01PM +0200, Morten Brørup wrote:
> > From: Stephen Hemminger [mailto:step...@networkplumber.org]
> > Sent: Wednesday, 22 May 2024 17.38
> > 
> > On Wed, 22 May 2024 10:31:39 +0200
> > Morten Brørup <m...@smartsharesystems.com> wrote:
> > 
> > > > +/* On 32 bit platform, need to use atomic to avoid load/store
> > tearing */
> > > > +typedef RTE_ATOMIC(uint64_t) rte_counter64_t;
> > >
> > > As shown by Godbolt experiments discussed in a previous thread [2],
> > non-tearing 64 bit counters can be implemented without using atomic
> > instructions on all 32 bit architectures supported by DPDK. So we should
> > use the counter/offset design pattern for RTE_ARCH_32 too.
> > >
> > > [2]:
> > https://inbox.dpdk.org/dev/98CBD80474FA8B44BF855DF32C47DC35E9F433@smarts
> > erver.smartshare.dk/
> > 
> > 
> > This code built with -O3 and -m32 on godbolt shows split problem.
> > 
> > #include <stdint.h>
> > 
> > typedef uint64_t rte_counter64_t;
> > 
> > void
> > rte_counter64_add(rte_counter64_t *counter, uint32_t val)
> > {
> >     *counter += val;
> > }
> > …   *counter = val;
> > }
> > 
> > rte_counter64_add:
> >         push    ebx
> >         mov     eax, DWORD PTR [esp+8]
> >         xor     ebx, ebx
> >         mov     ecx, DWORD PTR [esp+12]
> >         add     DWORD PTR [eax], ecx
> >         adc     DWORD PTR [eax+4], ebx
> >         pop     ebx
> >         ret
> > 
> > rte_counter64_read:
> >         mov     eax, DWORD PTR [esp+4]
> >         mov     edx, DWORD PTR [eax+4]
> >         mov     eax, DWORD PTR [eax]
> >         ret
> > rte_counter64_set:
> >         movq    xmm0, QWORD PTR [esp+8]
> >         mov     eax, DWORD PTR [esp+4]
> >         movq    QWORD PTR [eax], xmm0
> >         ret
> 
> Sure, atomic might be required on some 32 bit architectures and/or with some 
> compilers.

in theory i think you should be able to use generic atomics and
depending on the target you get codegen that works. it might be
something more expensive on 32-bit and nothing on 64-bit etc..

what's the damage if we just use atomic generic and relaxed ordering? is
the codegen not optimal?
 
> I envision a variety of 32 bit implementations, optimized for certain 
> architectures/compilers.
> 
> Some of them can provide non-tearing 64 bit load/store, so we should also use 
> the counter/offset design pattern for those.
> 

Reply via email to