I said elsewhere that I would convert this to __atomic, but then I re-read my commentary about using cmpxchg *without* a lock prefix. What we're looking for here is more or less non-interruptible, rather than atomic. And apparently I benchmarked this a while back as a 10x performance improvement.
Seems like the easiest thing is simply to use .byte instead of ,pn. Committed. r~
commit f3210a53394de39a8aa74ec9dcb23f2cc0551322 Author: rth <rth@138bc75d-0d04-0410-961f-82ee72b054a4> Date: Wed Nov 9 19:51:49 2011 +0000 libitm: Avoid non-portable x86 branch prediction mnemonic. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@181233 138bc75d-0d04-0410-961f-82ee72b054a4 diff --git a/libitm/ChangeLog b/libitm/ChangeLog index e78716d..0501d16 100644 --- a/libitm/ChangeLog +++ b/libitm/ChangeLog @@ -1,5 +1,8 @@ 2011-11-09 Richard Henderson <r...@redhat.com> + * config/x86/cacheline.h (gtm_cacheline::store_mask): Use .byte + to emit branch prediction hint. + * config/x86/sjlj.S: Protect elf directives with __ELF__. Protect .note.GNU-stack with __linux__. diff --git a/libitm/config/x86/cacheline.h b/libitm/config/x86/cacheline.h index 15a95b0..f91d7cc 100644 --- a/libitm/config/x86/cacheline.h +++ b/libitm/config/x86/cacheline.h @@ -144,7 +144,7 @@ gtm_cacheline::operator= (const gtm_cacheline & __restrict s) } #endif -// ??? Support masked integer stores more efficiently with an unlocked cmpxchg +// Support masked integer stores more efficiently with an unlocked cmpxchg // insn. My reasoning is that while we write to locations that we do not wish // to modify, we do it in an uninterruptable insn, and so we either truely // write back the original data or the insn fails -- unlike with a @@ -171,7 +171,8 @@ gtm_cacheline::store_mask (uint32_t *d, uint32_t s, uint8_t m) "and %[m], %[n]\n\t" "or %[s], %[n]\n\t" "cmpxchg %[n], %[d]\n\t" - "jnz,pn 0b" + ".byte 0x2e\n\t" // predict not-taken, aka jnz,pn + "jnz 0b" : [d] "+m"(*d), [n] "=&r" (n), [o] "+a"(o) : [s] "r" (s & bm), [m] "r" (~bm)); } @@ -198,7 +199,8 @@ gtm_cacheline::store_mask (uint64_t *d, uint64_t s, uint8_t m) "and %[m], %[n]\n\t" "or %[s], %[n]\n\t" "cmpxchg %[n], %[d]\n\t" - "jnz,pn 0b" + ".byte 0x2e\n\t" // predict not-taken, aka jnz,pn + "jnz 0b" : [d] "+m"(*d), [n] "=&r" (n), [o] "+a"(o) : [s] "r" (s & bm), [m] "r" (~bm)); #else