On Wed, Aug 09, 2017 at 05:06:03PM +0200, Peter Zijlstra wrote: > Now, ARM64 for instance plays funny games, it does something along the > lines of: > > cmpxchg(ptr, old, new) > { > do { > r = LL(ptr); > if (r != old) > return r; /* no barriers */ > r = new > } while (SC_release(ptr, r)); > smp_mb(); > return r; > } > > Thereby ordering things relative to the store on ptr, but the load can > very much escape. The thinking is that if success, we must observe the > latest value of ptr, but even in that case the load is not ordered and > could happen before. > > However, since we're guaranteed to observe the latest value of ptr (on > success) it doesn't matter if we reordered the load, there is no newer > value possible. > > So heaps of tricky, but correct afaict. Will?
And could not PPC do something similar: cmpxchg(ptr, old, new) { lwsync(); dp { r = LL(ptr); if (r != old) return; r = new; } while (SC(ptr, r)); sync(); return r; } ? the lwsync would make it store-release on SC with similar reasoning as above. And lwsync allows 'stores reordered after loads', which allows the prior smp_store_release() to leak past. Or is the reason this doesn't work on PPC that its RCpc?