Re: [PATCH, AArch64 v2 05/11] aarch64: Emit LSE st instructions

Richard Henderson Wed, 31 Oct 2018 10:08:06 -0700

On 10/31/18 3:04 PM, Will Deacon wrote:
> The example test above uses relaxed atomics in conjunction with an acquire
> fence, so I don't think we can actually use ST<op> at all without a change
> to the language specification. I previouslyyallocated P0861 for this purpose
> but never got a chance to write it up...
> 
> Perhaps the issue is a bit clearer with an additional thread (not often I
> say that!):
> 
> 
> P0 (atomic_int* y,atomic_int* x) {
>   atomic_store_explicit(x,1,memory_order_relaxed);
>   atomic_thread_fence(memory_order_release);
>   atomic_store_explicit(y,1,memory_order_relaxed);
> }
> 
> P1 (atomic_int* y,atomic_int* x) {
>   atomic_fetch_add_explicit(y,1,memory_order_relaxed);        // STADD
>   atomic_thread_fence(memory_order_acquire);
>   int r0 = atomic_load_explicit(x,memory_order_relaxed);
> }
> 
> P2 (atomic_int* y) {
>   int r1 = atomic_load_explicit(y,memory_order_relaxed);
> }
> 
> 
> My understanding is that it is forbidden for r0 == 0 and r1 == 2 after
> this test has executed. However, if the relaxed add in P1 compiles to
> STADD and the subsequent acquire fence is compiled as DMB LD, then we
> don't have any ordering guarantees in P1 and the forbidden result could
> be observed.


I suppose I don't understand exactly what you're saying.

I can see that, yes, if you split the fetch-add from the acquire in P1 you get
the incorrect results you describe.  But isn't that a bug in the test itself?
Why would not the only correct version have

P1 (atomic_int* y, atomic_int* x) {
  atomic_fetch_add_explicit(y, 1, memory_order_acquire);
  int r0 = atomic_load_explicit(x, memory_order_relaxed);
}

at which point we won't use STADD for the fetch-add, but LDADDA.

If the problem is more fundamental than this, would you have another go at
explaining?  In particular, I don't see the difference between

        ldadd   val, scratch, [base]
  vs
        stadd   val, [base]

and

        ldaddl  val, scratch, [base]
  vs
        staddl  val, [base]

where both pairs of instructions have the same memory ordering semantics.
Currently we are always producing the ld version of each pair.


r~

Re: [PATCH, AArch64 v2 05/11] aarch64: Emit LSE st instructions

Reply via email to