* Paul E. McKenney ([email protected]) wrote: > On Wed, Jun 16, 2010 at 07:57:55PM -0400, Mathieu Desnoyers wrote: > > * Paul E. McKenney ([email protected]) wrote: > > > On Wed, Jun 16, 2010 at 05:23:15PM -0400, Mathieu Desnoyers wrote: > > > > * Paul E. McKenney ([email protected]) wrote: > > > > > Add native support for armv7l. Other variants of ARM will likely > > > > > require > > > > > separate ports. > > > > > > > > > > Signed-off-by: Paul E. McKenney <[email protected]> > > > > > --- > > > > > configure.ac | 4 +++ > > > > > urcu/arch_armv7l.h | 59 > > > > > ++++++++++++++++++++++++++++++++++++++++++++ > > > > > urcu/uatomic_arch_armv7l.h | 48 +++++++++++++++++++++++++++++++++++ > > > > > > [ . . . ] > > > > > > > > +#ifdef __cplusplus > > > > > +extern "C" { > > > > > +#endif > > > > > + > > > > > +/* xchg */ > > > > > +#define uatomic_xchg(addr, v) __sync_lock_test_and_set(addr, v) > > > > > + > > > > > +/* cmpxchg */ > > > > > +#define uatomic_cmpxchg(addr, old, _new) \ > > > > > + __sync_val_compare_and_swap(addr, old, _new) > > > > > + > > > > > +/* uatomic_add_return */ > > > > > +#define uatomic_add_return(addr, v) __sync_add_and_fetch(addr, v) > > > > > > > > So, do we end up trusting that gcc got the memory barriers right in the > > > > ARM > > > > __sync_() primitives ? That sounds unlikely. > > > > > > > > I'd vote for surrounding these primitives with smp_mb(). > > > > > > On ARM, my current belief is that the primitives other than > > > __sync_synchronize() and __sync_lock_release() are set up correctly. > > > > > > However, I must defer to Paolo and Uli on this. > > > > There is nothing like a quick test to see the result: > > > > With a arm-linux-cs2009q1-203sb1 scratchbox compiler (gcc 4.3.3, provided by > > Nokia for the Omap3): > > > > arm-none-linux-gnueabi-gcc-4.3.3 (Sourcery G++ Lite 2009q1-203) 4.3.3 > > > > I compile, with > > > > /scratchbox/compilers/arm-linux-cs2009q1-203sb1/bin/arm-none-linux-gnueabi-gcc-4.3.3 > > -mcpu=cortex-a9 -mtune=cortex-a9 -O2 -o armtest armtest.c > > > > the following program: > > > > int a; > > > > int > > f() > > { > > __sync_val_compare_and_swap(&a, 4, 1); > > //__sync_lock_test_and_set(&a, 1); > > //__sync_add_and_fetch(&a, 1); > > //__sync_synchronize(); > > } > > > > int main() > > { > > f(); > > } > > > > and get: > > > > /scratchbox/compilers/arm-linux-cs2009q1-203sb1/bin/arm-none-linux-gnueabi-objdump > > -S armtest > > > > [...] > > > > > > 000083cc <f>: > > 83cc: e59f0008 ldr r0, [pc, #8] ; 83dc <f+0x10> > > 83d0: e3a01004 mov r1, #4 ; 0x4 > > 83d4: e3a02001 mov r2, #1 ; 0x1 > > 83d8: ea000305 b 8ff4 <__sync_val_compare_and_swap_4> > > 83dc: 00011524 .word 0x00011524 > > > > [...] > > > > 00008ff4 <__sync_val_compare_and_swap_4>: > > 8ff4: e92d41f0 push {r4, r5, r6, r7, r8, lr} > > 8ff8: e59f8034 ldr r8, [pc, #52] ; 9034 > > <__sync_val_compare_and_swap_4+0x40> > > 8ffc: e1a06000 mov r6, r0 > > 9000: e1a05001 mov r5, r1 > > 9004: e1a07002 mov r7, r2 > > 9008: e5964000 ldr r4, [r6] > > 900c: e1a00005 mov r0, r5 > > 9010: e1550004 cmp r5, r4 > > 9014: e1a01007 mov r1, r7 > > 9018: e1a02006 mov r2, r6 > > 901c: 1a000002 bne 902c > > <__sync_val_compare_and_swap_4+0x38> > > 9020: e12fff38 blx r8 > > 9024: e3500000 cmp r0, #0 ; 0x0 > > 9028: 1afffff6 bne 9008 > > <__sync_val_compare_and_swap_4+0x14> > > 902c: e1a00004 mov r0, r4 > > 9030: e8bd81f0 pop {r4, r5, r6, r7, r8, pc} > > 9034: ffff0fc0 .word 0xffff0fc0 > > > > Where sadly the appropriate memory barriers are missing, and even the > > appropriate ldrex/teq, strexeq sequence is missing. So not only is this > > incorrect in terms of memory barriers, but also in terms of atomicity. > > Argh. I > > don't know if a compiler more recent than 4.3.3 would do better though, but > > I > > start to think that it would be wise to stay far away from gcc __sync_*() > > primitives. For ARM at least. > > The way it was explained to me is that the "blx r8" above branches to a > code page supplied by the kernel, and that this page contains the memory > barriers and atomic instructions. The "x" in the "blx" switches from > ARM to Thumb instruction-set format, just to keep things interesting. > > The address is 0xffff0fc0, the value at address 0x9035 above. Naturally, > gdb refuses to let me look at this address range. My attempt to access > this range from inside the program also fails. Which is not a surprise, > it is not supposed to be mapped. Assuming you have a multicore ARMv6 > or later, the kernel supplies the following code at this address (see > arch/arm/kernel/entry-armv.S): > > smp_dmb > 1: ldrex r3, [r2] > subs r3, r3, r0 > strexeq r3, r1, [r2] > teqeq r3, #1 > beq 1b > rsbs r0, r3, #0 > /* beware -- each __kuser slot must be 8 instructions max */ > #ifdef CONFIG_SMP > b __kuser_memory_barrier > #else > usr_ret lr > #endif > > The smp_dmb is an assembler macro (haven't come across one of those for > like 30 years!!!) that expands to different things depending on the ARM > architecture level that the kernel is built for. The ldrex and strexeq > are ARM's atomic instructions, sort of like larx/stcx on ppc. > > This is admittedly a bit involuted, but the really nice thing about it > is that it works on a wider variety of ARM architectures. > > This approach OK for you?
Yep, sounds fine :) I wonder if we could document how far back we expect ARM Linux kernels to work (e.g. not earlier than 2.6.15). I guess this in-kernel cmpxchg has not been there forever. Thanks, Mathieu -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com _______________________________________________ ltt-dev mailing list [email protected] http://lists.casi.polymtl.ca/cgi-bin/mailman/listinfo/ltt-dev
