On Thu, May 19, 2016 at 10:52:19AM +0100, David Howells wrote:
> Peter Zijlstra <[email protected]> wrote:
> 
> > Does this generate 'sane' code for LL/SC archs? That is, a single LL/SC
> > loop and not a loop around an LL/SC cmpxchg.
> 
> Depends on your definition of 'sane'.  The code will work - but it's not
> necessarily the most optimal.  gcc currently keeps the __atomic_load_n() and
> the fudging in the middle separate from the __atomic_compare_exchange_n().
> 
> So on aarch64:
> 
>       static __always_inline int __atomic_add_unless(atomic_t *v,
>                                                      int addend, int unless)
>       {
>               int cur = __atomic_load_n(&v->counter, __ATOMIC_RELAXED);
>               int new;
> 
>               do {
>                       if (__builtin_expect(cur == unless, 0))
>                               break;
>                       new = cur + addend;
>               } while (!__atomic_compare_exchange_n(&v->counter,
>                                                     &cur, new,
>                                                     false,
>                                                     __ATOMIC_SEQ_CST,
>                                                     __ATOMIC_RELAXED));
>               return cur;
>       }
> 
>       int test_atomic_add_unless(atomic_t *counter)
>       {
>               return __atomic_add_unless(counter, 0x56, 0x23);
>       }

[...]

> I think the code it generates should look something like:
> 
>       test_atomic_add_unless:
>       .L7:
>               ldaxr   w1, [x0]                # __atomic_load_n()
>               cmp     w1, 35                  # } if (cur == unless)
>               beq     .L4                     # }     break
>               add     w2, w1, 86              # new = cur + addend
>               stlxr   w4, w2, [x0]
>               cbnz    w4, .L7
>       .L4:
>               mov     w1, w0
>               ret
> 
> but that requires the compiler to split up the LDAXR and STLXR instructions
> and render arbitrary code between.  I suspect that might be quite a stretch.

... it's also weaker than the requirements of the kernel memory model.
See 8e86f0b409a4 ("arm64: atomics: fix use of acquire + release for full
barrier semantics") for the gory details.

Will

Reply via email to