> Now, ARM64 for instance plays funny games, it does something along the
> lines of:
> cmpxchg(ptr, old, new)
> {
>       do {
>               r = LL(ptr);
>               if (r != old)
>                       return r; /* no barriers */
>               r = new
>       } while (SC_release(ptr, r));
>       smp_mb();
>       return r;
> }
> Thereby ordering things relative to the store on ptr, but the load can
> very much escape. The thinking is that if success, we must observe the
> latest value of ptr, but even in that case the load is not ordered and
> could happen before.
> However, since we're guaranteed to observe the latest value of ptr (on
> success) it doesn't matter if we reordered the load, there is no newer
> value possible.
> So heaps of tricky, but correct afaict. Will?

And could not PPC do something similar:

cmpxchg(ptr, old, new)
        dp {
                r = LL(ptr);
                if (r != old)
                r = new;
        } while (SC(ptr, r));
        return r;


the lwsync would make it store-release on SC with similar reasoning as

And lwsync allows 'stores reordered after loads', which allows the prior
smp_store_release() to leak past.

Or is the reason this doesn't work on PPC that its RCpc?

