On 11/12/2010 01:38 PM, Mathieu Desnoyers wrote:
But I wonder if, on any architecture, there is a significantly better
way to implement:

uatomic_xchg_add/uatomic_xchg_or (returning the old value)
               ^^^

Assuming you mean "and" (for "add" there is an obvious benefit on x86 which has XADD), there is some benefit on LL/SC architectures, where you can do

        ll  reg, mem
        and temp, reg, val
        sc  mem, temp
        <redo upon lost reservation>
        isync

instead of this more complicated code using cmpxchg:

  1:
        ld  temp1, mem          ; normal load of old value
        and temp2, temp1, val   ; compute new one
  2:
        ll  reg, mem            ; reg = cmpxchg(&x, temp1, temp2)
        cmp reg, temp1
        bne 1b
        sc  mem, temp2
        <redo from 2 upon lost reservation>
        isync

Actually, what you get from uatomic_ppc.h is even a bit worse:

  1:
        ld  temp1, mem          ; normal load of old value
        and temp2, temp1, val   ; compute new one
  2:
        ll  reg, mem            ; reg = cmpxchg(&x, temp1, temp2)
        cmp reg, temp1
        bne 3f
        sc  mem, temp2
        <redo from 2 upon lost reservation>
        isync
  3:
        cmp reg, temp1          ; compiler cannot optimize jump-to-jump
        bne 1b                  ; because "bne 3f" is inside an asm

Paolo

_______________________________________________
ltt-dev mailing list
[email protected]
http://lists.casi.polymtl.ca/cgi-bin/mailman/listinfo/ltt-dev

Reply via email to