On 11/12/2010 01:38 PM, Mathieu Desnoyers wrote:
But I wonder if, on any architecture, there is a significantly better
way to implement:
uatomic_xchg_add/uatomic_xchg_or (returning the old value)
^^^
Assuming you mean "and" (for "add" there is an obvious benefit on x86
which has XADD), there is some benefit on LL/SC architectures, where you
can do
ll reg, mem
and temp, reg, val
sc mem, temp
<redo upon lost reservation>
isync
instead of this more complicated code using cmpxchg:
1:
ld temp1, mem ; normal load of old value
and temp2, temp1, val ; compute new one
2:
ll reg, mem ; reg = cmpxchg(&x, temp1, temp2)
cmp reg, temp1
bne 1b
sc mem, temp2
<redo from 2 upon lost reservation>
isync
Actually, what you get from uatomic_ppc.h is even a bit worse:
1:
ld temp1, mem ; normal load of old value
and temp2, temp1, val ; compute new one
2:
ll reg, mem ; reg = cmpxchg(&x, temp1, temp2)
cmp reg, temp1
bne 3f
sc mem, temp2
<redo from 2 upon lost reservation>
isync
3:
cmp reg, temp1 ; compiler cannot optimize jump-to-jump
bne 1b ; because "bne 3f" is inside an asm
Paolo
_______________________________________________
ltt-dev mailing list
[email protected]
http://lists.casi.polymtl.ca/cgi-bin/mailman/listinfo/ltt-dev