I said elsewhere that I would convert this to __atomic, but then
I re-read my commentary about using cmpxchg *without* a lock prefix.
What we're looking for here is more or less non-interruptible, 
rather than atomic.  And apparently I benchmarked this a while back
as a 10x performance improvement.

Seems like the easiest thing is simply to use .byte instead of ,pn.

Committed.


r~
commit f3210a53394de39a8aa74ec9dcb23f2cc0551322
Author: rth <rth@138bc75d-0d04-0410-961f-82ee72b054a4>
Date:   Wed Nov 9 19:51:49 2011 +0000

    libitm: Avoid non-portable x86 branch prediction mnemonic.
    
    git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@181233 
138bc75d-0d04-0410-961f-82ee72b054a4

diff --git a/libitm/ChangeLog b/libitm/ChangeLog
index e78716d..0501d16 100644
--- a/libitm/ChangeLog
+++ b/libitm/ChangeLog
@@ -1,5 +1,8 @@
 2011-11-09  Richard Henderson  <r...@redhat.com>
 
+       * config/x86/cacheline.h (gtm_cacheline::store_mask): Use .byte
+       to emit branch prediction hint.
+
        * config/x86/sjlj.S: Protect elf directives with __ELF__.
        Protect .note.GNU-stack with __linux__.
 
diff --git a/libitm/config/x86/cacheline.h b/libitm/config/x86/cacheline.h
index 15a95b0..f91d7cc 100644
--- a/libitm/config/x86/cacheline.h
+++ b/libitm/config/x86/cacheline.h
@@ -144,7 +144,7 @@ gtm_cacheline::operator= (const gtm_cacheline & __restrict 
s)
 }
 #endif
 
-// ??? Support masked integer stores more efficiently with an unlocked cmpxchg
+// Support masked integer stores more efficiently with an unlocked cmpxchg
 // insn.  My reasoning is that while we write to locations that we do not wish
 // to modify, we do it in an uninterruptable insn, and so we either truely
 // write back the original data or the insn fails -- unlike with a
@@ -171,7 +171,8 @@ gtm_cacheline::store_mask (uint32_t *d, uint32_t s, uint8_t 
m)
                "and    %[m], %[n]\n\t"
                "or     %[s], %[n]\n\t"
                "cmpxchg %[n], %[d]\n\t"
-               "jnz,pn 0b"
+               ".byte  0x2e\n\t"       // predict not-taken, aka jnz,pn
+               "jnz    0b"
                : [d] "+m"(*d), [n] "=&r" (n), [o] "+a"(o)
                : [s] "r" (s & bm), [m] "r" (~bm));
        }
@@ -198,7 +199,8 @@ gtm_cacheline::store_mask (uint64_t *d, uint64_t s, uint8_t 
m)
                "and    %[m], %[n]\n\t"
                "or     %[s], %[n]\n\t"
                "cmpxchg %[n], %[d]\n\t"
-               "jnz,pn 0b"
+               ".byte  0x2e\n\t"       // predict not-taken, aka jnz,pn
+               "jnz    0b"
                : [d] "+m"(*d), [n] "=&r" (n), [o] "+a"(o)
                : [s] "r" (s & bm), [m] "r" (~bm));
 #else

Reply via email to