Torvald, thank you for your output. See my response below.
On Monday, February 26, 2018 1:35 PM, Torvald Riegel <[email protected]>
wrote:
> ... does not imply this latter statement. The statement you cited is
> about what the standard itself requires, not what makes sense for a
> particular implementation.
True but makes sense to provide true atomics when they are available. Since the
standard seem to allow atomic_load implementation using RMW, does not seem to
be a problem.
In fact, lock_free flag for this type can return true only if mcx16 is
specified; otherwise -- it returns false (since it can only be determined
during runtime, assuming worst case scenario)
> So, in such a case, using the wide CAS for
> atomic loads breaks a reasonable assumption. Moreover, it's also a
> special case, in that 32b atomics do work as intended.
But in this case a programmer already makes an assumption that atomic_load does
not use RMW which C11 does not seem to guarantee.Of course, for single-width
operations, the programmer may in most practical cases assume it (even though
there is no guarantee).
Anyway, there is no good solution here for double-width operations, and the
programmer should not assume it is possible when writing portable code.In fact,
lock-based solution is even more confusing and potentially error-prone (e.g.,
cannot be safely used inside signal handlers since it is not lock-free, etc)
> The behavior you favor would violate that, and
> there's no portable way to distinguish one from the other.
There is already a similar problem with IFFUNC (when used with Linux and
glibc). In fact, I do not see any difference here. Redirection to libatomic
when mcx16 is specified just adds extra cost + less predictable behavior.
Moreover, it seems counterintuitive -- I specify a flag that mcx16 is supported
but gcc still does not use it (at least directly). It is possible to make a
change to libatomic to always use cmpxchg16b when available (even on systems
without IFFUNC), this way it is totally consistent and binary compatible for
code compiled with and without mcx16.
> I see your point in wanting to have a builtin or such for the 64b atomic
> CAS. However, IMO, this doesn't fit into the world of C11/C++11
> atomics, and thus rather should be accessible through a separate
> interface.
Why not? If atomic_load is not really an issue, then it may be good to use
standardized interface.