On Mon, Nov 28, 2011 at 05:23:37PM +0100, Jakub Jelinek wrote: > On Mon, Nov 28, 2011 at 09:23:43AM +1030, Alan Modra wrote: > > + int count = *sem; > > + > > + while ((count & 0x7fffffff) != 0) > > + { > > + int oldval = count; > > + __atomic_compare_exchange_4 (sem, &oldval, count - 1, > > + false, MEMMODEL_ACQUIRE, MEMMODEL_RELAXED); > > + if (__builtin_expect (oldval == count, 1)) > > + return; > > Why aren't you instead testing the return value of > __atomic_compare_exchange_4 (here and in other cases)?
If you use the return value on powerpc, you find that requires two load immediate instructions (loading zero and one), and a compare against zero. That makes three fewer instructions as written, because the oldval == count comparison has already been done inside the atomic sequence. I'd expect fewer on most other architectures unless they happen to have a compare and exchange instruction that sets condition bits (ie. Intel). Even on Intel the way I've written the code shouldn't take more instructions with a properly written cmpxchg rtl description. Does it? Hmm, I suppose you could argue that powerpc and others ought to not generate those three extra instructions when using the return value. I'll see about fixing powerpc. -- Alan Modra Australia Development Lab, IBM