Hello!

On Mon, Dec 04, 2017 at 04:37:50PM +0000, debayang.qdt wrote:

> For some architectures like armv8a  - newer GCC generates a full 
> barrier for the __sync operations compared to the __atomics .
> 
> This is seen to give some performance lag on these architectures when 
> using __sync compared to the atomics apis under high contention.
> 
> The C++ atomic ops looks good as well 
> (http://mailman.nginx.org/pipermail/nginx-devel/2016-September/008805.
> html), However I would like to test it out and confirm.
> 
> e.g   sync_fetch_add with newer GCC:
> 
>   58:   f94007e0        ldr     x0, [sp,#8]
>   5c:   c85f7c01         ldxr    x1, [x0]
>   60:   91000821        add     x1, x1, #0x2
>   64:   c802fc01         stlxr   w2, x1, [x0]
>   68:   35ffffa2           cbnz    w2, 5c <testing+0xc>
>   6c:   d5033bbf        dmb     ish   
> 
> With atomics_fetch_add  with SEQ_CST:
> 
>   58:   f94007e0        ldr     x0, [sp,#8]
>   5c:   c85ffc01          ldaxr   x1, [x0]
>   60:   91000821       add     x1, x1, #0x2
>   64:   c802fc01        stlxr   w2, x1, [x0]
>   68:   35ffffa2          cbnz    w2, 5c <testing+0xc>

>> Well, this may actualy mean that the __atomic and stdatomic variants won't 
>> work for us, as it does not seem to imply a barrier protecting other 
>> variables.  While it may not be important for many uses of 
>> ngx_atomic_fetch_add(), it is certainly important for
ngx_atomic_cmp_set() we use for shared memory mutexes, where it is assumed to 
be a full barrier at least for the memory area the mutex protects.

>>(Just for the record, the GCC change in question seems to be documented at 
>>https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697.)

Thanks for the link.  As per the above discussion the legacy sync calls were 
fixed to generate a barrier in gcc 5+ for aarch64 platform - to conform with 
the __sync full barrier specs.

IMO if the weak variants or _atomic helps multiple cases as you mentioned - it 
still may make sense,
rather than using sync calls always - as a catch all synchronization mechanism, 
because it is costly on some architectures.
For the specific scenarios where we need strong memory order requirements we 
can still use the atomics with explicit seq/cst memory model and standalone 
fence along with it depending on context.


-- Debayan Ghosh
_______________________________________________
nginx-devel mailing list
[email protected]
http://mailman.nginx.org/mailman/listinfo/nginx-devel

Reply via email to