Thanks, everyone, for the output, it is very useful. I am just proposing to 
consider the change unless there are clear roadblocks. (Either design choice is 
probably OK with respect to the standard formally speaking, but there are some 
clear advantages also.) I wrote a summary of pros & cons (which, of course, is 
slightly biased towards the change :) )
I also opened Bug 84563 with the rationale.


Pros of the proposed approach:
1. Ability to use guaranteed lock-free double-width atomics (when mcx16 is 
specified for x86-64, and always for arm64) in more or less portable manner 
across different supported architectures (without resorting to non-standard 
extensions or writing separate assembly code for each architecture). Hopefully, 
the behavior may also be made more or less consistent across different 
compilers over time. It is already the case for clang/llvm. As mentioned, 
double-width lock-free atomics have real practical use (ABA tags for pointers).

2. More likely to find a bug immediately if a programmer tries to do something 
that is not guaranteed by the standard (i.e., getting segfault on read-only 
memory when using double-width atomic_load). This is true even if mcx16 is not 
used, as most CPUs have cmpxchg16b, and libatomic will use it.On the other 
hand, atomic_load implemented through locks may have hard-to-find and debug 
issues in signal handlers, interrupt contexts, etc when a programmer 
erroneously assumes that atomic_load is non-blocking

3. For arm64 the corresponding instructions are always available, no need for 
mcx16 flag or redirection to libatomic at all (libatomic may still keep old 
implementation for backward compatibility).
4. Faster & easy to analyze code when mcx16 is specified.

5. Ability to tell for sure if the implementation is lock-free by checking 
corresponding C11 flag when mcx16 is specified. When unspecified, the flag will 
be false to accommodate the worse-case scenario.

6. Consistent behavior everywhere on all platforms regardless of IFFUNC, mcx16 
flag, etc. If cmpxchg16b is available, it is always used (platforms that do not 
support IFFUNC will use function pointers for redirection). The only thing the 
mcx16 flag changes is removing indirection to libatomic and giving guaranteed 
lock_free flag for corresponding types. (BTW, in practice, if you use the flag, 
you should know what you are doing already)

7. Ability to finally deprecate old __sync builtins, and use new and more 
advanced __atomic everywhere.


Cons of the proposed approach:

1. Compiler may place const atomic objects to .rodata. (Avoided by making sure 
_Atomic objects with the size > 8 are not placed in .rodata + clarifying that 
casting random .rodata objects for double-width atomics is undefined and is not 
allowed.)

2. Backward compatibility concerns if used outside glibc/IFFUNC. Most likely, 
even in this case, not an issue since all calls there are already redirected to 
libatomic anyway, and statically-linked binaries will not interact with new 
binaries directly.
3. Read-only memory for atomic_load will not be supported for double-width 
types. But it is actually better than hiding the problem under the carpet 
(current behavior is actually even worse because it is inconsistent across 
different platforms, i.e. different for x86-64 in Linux and arm64). Anyway, it 
is better to use a lock-based approach explicitly if for whatever reason it is 
more preferable (read-only memory, performance (?), etc).
-- Ruslan

Reply via email to