Torvald, thank you for your output, but I think, this discussion gets a little 
pointless. There is nothing else I can add since gcc folks are reluctant to 
this change anyway. In my opinion, there is no compelling reason against such 
an implementation (it is perfectly fine with the standard, read-only memory is 
not guaranteed for atomic_load anyway). Even binary compatibility that was 
mentioned is unlikely to be an issue if implemented as I described. And finally 
this is something that can actually be useful in practice (at least as far as I 
can judge from my experience). By the way, this issue was already raised 
multiple times during last couple of years by different people who actually use 
it for various real projects (bugs were eventually closed as 'INVALID').
All described challenges are purely technical and can easily be resolved. 
Moreover, clang/llvm chose this implementation, and it seems very logical and 
non-confusing to me. It certainly makes sense to expose hardware capabilities 
through standard interfaces whenever possible.

For my projects, I will simply fall back to my own implementation using inline 
assembly (at least for now) because, unfortunately, it is the only thing that 
is guaranteed to work outside of clang/llvm in the foreseeable future (__sync 
functions have some limitations and do not look like an attractive option 
either, by the way).



    On Tuesday, February 27, 2018 11:21 AM, Torvald Riegel <trie...@redhat.com> 
wrote:
 

 On Tue, 2018-02-27 at 13:16 +0000, Ruslan Nikolaev via gcc wrote:
> > 3) Torvald pointed out further considerations such as users expecting 
> > lock-free atomic loads to be faster than stores.
> 
> Is it even true? Is it faster to use some global lock (implemented through 
> RMW) than a single RMW operation? If you use this global lock, you will not 
> get loads faster than stores.

If GCC declares a type as lock-free, atomic loads on this type will be
natively supported through some sort of load instruction.  That means
they are faster than stores under concurrent accesses, in particular
when there are concurrent atomic loads (for all major HW we care about).

If there is no natively supported atomic load, GCC will not declare the
type to be lock-free.

Nobody made statement about performance of locks vs. RMWs.





   

Reply via email to