On Wed, Jun 16, 2010 at 09:37:02PM -0400, Mathieu Desnoyers wrote:
> * Paul E. McKenney ([email protected]) wrote:
> > On Wed, Jun 16, 2010 at 07:57:55PM -0400, Mathieu Desnoyers wrote:
> > > * Paul E. McKenney ([email protected]) wrote:
> > > > On Wed, Jun 16, 2010 at 05:23:15PM -0400, Mathieu Desnoyers wrote:
> > > > > * Paul E. McKenney ([email protected]) wrote:
> > > > > > Add native support for armv7l.  Other variants of ARM will likely 
> > > > > > require
> > > > > > separate ports.
> > > > > > 
> > > > > > Signed-off-by: Paul E. McKenney <[email protected]>
> > > > > > ---
> > > > > >  configure.ac               |    4 +++
> > > > > >  urcu/arch_armv7l.h         |   59 
> > > > > > ++++++++++++++++++++++++++++++++++++++++++++
> > > > > >  urcu/uatomic_arch_armv7l.h |   48 
> > > > > > +++++++++++++++++++++++++++++++++++
> > > > 
> > > > [ . . . ]
> > > > 
> > > > > > +#ifdef __cplusplus
> > > > > > +extern "C" {
> > > > > > +#endif 
> > > > > > +
> > > > > > +/* xchg */
> > > > > > +#define uatomic_xchg(addr, v) __sync_lock_test_and_set(addr, v)
> > > > > > +
> > > > > > +/* cmpxchg */
> > > > > > +#define uatomic_cmpxchg(addr, old, _new) \
> > > > > > +   __sync_val_compare_and_swap(addr, old, _new)
> > > > > > +
> > > > > > +/* uatomic_add_return */
> > > > > > +#define uatomic_add_return(addr, v) __sync_add_and_fetch(addr, v)
> > > > > 
> > > > > So, do we end up trusting that gcc got the memory barriers right in 
> > > > > the ARM
> > > > > __sync_() primitives ? That sounds unlikely.
> > > > > 
> > > > > I'd vote for surrounding these primitives with smp_mb().
> > > > 
> > > > On ARM, my current belief is that the primitives other than
> > > > __sync_synchronize() and __sync_lock_release() are set up correctly.
> > > > 
> > > > However, I must defer to Paolo and Uli on this.
> > > 
> > > There is nothing like a quick test to see the result:
> > > 
> > > With a arm-linux-cs2009q1-203sb1 scratchbox compiler (gcc 4.3.3, provided 
> > > by
> > > Nokia for the Omap3):
> > > 
> > > arm-none-linux-gnueabi-gcc-4.3.3 (Sourcery G++ Lite 2009q1-203) 4.3.3
> > > 
> > > I compile, with
> > > 
> > > /scratchbox/compilers/arm-linux-cs2009q1-203sb1/bin/arm-none-linux-gnueabi-gcc-4.3.3
> > >  -mcpu=cortex-a9 -mtune=cortex-a9 -O2 -o armtest armtest.c
> > > 
> > > the following program:
> > > 
> > > int a;
> > > 
> > > int
> > > f()
> > > {
> > >         __sync_val_compare_and_swap(&a, 4, 1);
> > >         //__sync_lock_test_and_set(&a, 1);
> > >         //__sync_add_and_fetch(&a, 1);
> > >         //__sync_synchronize();
> > > }
> > > 
> > > int main()
> > > {
> > >         f();
> > > }
> > > 
> > > and get:
> > > 
> > > /scratchbox/compilers/arm-linux-cs2009q1-203sb1/bin/arm-none-linux-gnueabi-objdump
> > >  -S armtest
> > > 
> > > [...]
> > > 
> > > 
> > > 000083cc <f>:
> > >     83cc:       e59f0008        ldr     r0, [pc, #8]    ; 83dc <f+0x10>
> > >     83d0:       e3a01004        mov     r1, #4  ; 0x4
> > >     83d4:       e3a02001        mov     r2, #1  ; 0x1
> > >     83d8:       ea000305        b       8ff4 
> > > <__sync_val_compare_and_swap_4>
> > >     83dc:       00011524        .word   0x00011524
> > > 
> > > [...]
> > > 
> > > 00008ff4 <__sync_val_compare_and_swap_4>:
> > >     8ff4:       e92d41f0        push    {r4, r5, r6, r7, r8, lr}
> > >     8ff8:       e59f8034        ldr     r8, [pc, #52]   ; 9034 
> > > <__sync_val_compare_and_swap_4+0x40>
> > >     8ffc:       e1a06000        mov     r6, r0
> > >     9000:       e1a05001        mov     r5, r1
> > >     9004:       e1a07002        mov     r7, r2
> > >     9008:       e5964000        ldr     r4, [r6]
> > >     900c:       e1a00005        mov     r0, r5
> > >     9010:       e1550004        cmp     r5, r4
> > >     9014:       e1a01007        mov     r1, r7
> > >     9018:       e1a02006        mov     r2, r6
> > >     901c:       1a000002        bne     902c 
> > > <__sync_val_compare_and_swap_4+0x38>
> > >     9020:       e12fff38        blx     r8
> > >     9024:       e3500000        cmp     r0, #0  ; 0x0
> > >     9028:       1afffff6        bne     9008 
> > > <__sync_val_compare_and_swap_4+0x14>
> > >     902c:       e1a00004        mov     r0, r4
> > >     9030:       e8bd81f0        pop     {r4, r5, r6, r7, r8, pc}
> > >     9034:       ffff0fc0        .word   0xffff0fc0
> > > 
> > > Where sadly the appropriate memory barriers are missing, and even the
> > > appropriate ldrex/teq, strexeq sequence is missing. So not only is this
> > > incorrect in terms of memory barriers, but also in terms of atomicity. 
> > > Argh. I
> > > don't know if a compiler more recent than 4.3.3 would do better though, 
> > > but I
> > > start to think that it would be wise to stay far away from gcc __sync_*()
> > > primitives. For ARM at least.
> > 
> > The way it was explained to me is that the "blx r8" above branches to a
> > code page supplied by the kernel, and that this page contains the memory
> > barriers and atomic instructions.  The "x" in the "blx" switches from
> > ARM to Thumb instruction-set format, just to keep things interesting.
> > 
> > The address is 0xffff0fc0, the value at address 0x9035 above.  Naturally,
> > gdb refuses to let me look at this address range.  My attempt to access
> > this range from inside the program also fails.  Which is not a surprise,
> > it is not supposed to be mapped.  Assuming you have a multicore ARMv6
> > or later, the kernel supplies the following code at this address (see
> > arch/arm/kernel/entry-armv.S):
> > 
> >             smp_dmb
> >     1:      ldrex   r3, [r2]
> >             subs    r3, r3, r0
> >             strexeq r3, r1, [r2]
> >             teqeq   r3, #1
> >             beq     1b
> >             rsbs    r0, r3, #0
> >             /* beware -- each __kuser slot must be 8 instructions max */
> >     #ifdef CONFIG_SMP
> >             b       __kuser_memory_barrier
> >     #else
> >             usr_ret lr
> >     #endif
> > 
> > The smp_dmb is an assembler macro (haven't come across one of those for
> > like 30 years!!!) that expands to different things depending on the ARM
> > architecture level that the kernel is built for.  The ldrex and strexeq
> > are ARM's atomic instructions, sort of like larx/stcx on ppc.
> > 
> > This is admittedly a bit involuted, but the really nice thing about it
> > is that it works on a wider variety of ARM architectures.
> > 
> > This approach OK for you?
> 
> Yep, sounds fine :) I wonder if we could document how far back we expect ARM
> Linux kernels to work (e.g. not earlier than 2.6.15). I guess this in-kernel
> cmpxchg has not been there forever.

OK, updating comments and commit message and reposting.

                                                        Thanx, Paul

_______________________________________________
ltt-dev mailing list
[email protected]
http://lists.casi.polymtl.ca/cgi-bin/mailman/listinfo/ltt-dev

Reply via email to