* Nick Piggin ([EMAIL PROTECTED]) wrote: > Mathieu Desnoyers wrote: > > >I tried it with and without the LOCK prefix on my Pentium 4. > > > >Locked cmpxchg8b : 90 cycles > >Non locked cmpxchg8b: 30 cycles > >sti: 166 cycles > >cli: 159 cycles > > > >So, hrm, even if we use the locked version, it is still much faster than > >the sti/cli. I am thoughtful about the comment in asm-i386/system.h: > > Curious: what does it look like if the memory is not in cache? I > found that cmpxchg is relatively slower than other rmw instructions > in that case. >
Actually, I have just seen that cmpxchg64 and cmpxchg64_local are doing exactly this and they are already implemented in asm-i386/system.h. A quick test: I am doing clflush in a loop (substracting its time from the following loops) to have a memory hit when I do cmpxchg. This is the result of just the cmpxchg8b: non locked cmpxchg8b: 583.37 cycles locked cmpxchg8b: 650.48 cycles rmw in 3 operations: 581.43 cycles So the locked cmpxchg is 67 cycles slower than the non locked cmpxchg, which fits with my 30 vs 90 cycles. rmw is a tiny bit faster than cmpxchg8b (2 cycles), but nothing to call home about. Mathieu -- Mathieu Desnoyers Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/