Re: [PATCH] Fix memory orders description in atomic ops built-ins docs.

Torvald Riegel Thu, 21 May 2015 11:27:08 -0700

On Thu, 2015-05-21 at 16:45 +0100, Matthew Wahab wrote:
> On 19/05/15 20:20, Torvald Riegel wrote:
> > On Mon, 2015-05-18 at 17:36 +0100, Matthew Wahab wrote:
> >> Hello,
> >>
> >> On 15/05/15 17:22, Torvald Riegel wrote:
> >>> This patch improves the documentation of the built-ins for atomic
> >>> operations.
> >>
> >> The "memory model" to "memory order" change does improve things but I 
> >> think that
> >> the patch has some problems. As it is now, it makes some of the 
> >> descriptions
> >> quite difficult to understand and seems to assume more familiarity with 
> >> details
> >> of the C++11 specification then might be expected.
> >
> > I'd say that's a side effect of the C++11 memory model being the
> > reference specification of the built-ins.
> >
> >> Generally, the memory order descriptions seem to be targeted towards 
> >> language
> >> designers but don't provide for anybody trying to understand how to 
> >> implement or
> >> to use the built-ins.
> >
> > I agree that the current descriptions aren't a tutorial on the C++11
> > memory model.  However, given that the model is not GCC-specific, we
> > aren't really in a need to provide a tutorial, in the same way that we
> > don't provide a C++ tutorial.  Users can pick the C++11 memory model
> > educational material of their choice, and we need to document what's
> > missing to apply the C++11 knowledge to the built-ins we provide.
> >
> 
> We seem to have different views about the purpose of the manual page. I'm 
> treating it 
> as a description of the built-in functions provided by gcc to generate the 
> code 
> needed to implement the C++11 model. That is, the built-ins are distinct from 
> C++11 
> and their descriptions should be, as far as possible, independent of the 
> methods used 
> in the C++11 specification to describe the C++11 memory model.


OK.  But we'd need a *precise* specification of what they do if we'd
want to make them separate from the C++11 memory model.  And we don't
have that, would you agree?

It's also not a trivial task, so I wouldn't be optimistic that someone
would offer to write such a specification, and have it cross-checked.

> I understand of course that the __atomics were added in order to support 
> C++11 but 
> that doesn't make them part of C++11 and, since __atomic functions can be 
> made 
> available when C11/C++11 may not be, it seems to make sense to try for 
> stand-alone 
> descriptions.

The compiler can very well provide the C++11 *memory model* without
creating any dependency on the other language/library pieces of C++11 or
C11.  Prior to C++11, multi-threaded executions were not defined by the
standard, so we're not conflicting with anything in prior language
standards, right?

Another way to see this is to say that we just *copy* the C++11 memory
model and use it as the memory model that specifies the behavior of the
atomic built-ins.  That additionally frees us from having to come up
with and maintain our GCC-specific specification of atomics and a memory
model.

> I'm also concerned that the patch, by describing things in terms of formal 
> C++11 
> concepts, makes it more difficult for people to know what the built-ins can 
> be 
> expected to do and so make the built-in more difficult to use There is a 
> danger that 
> rather than take a risk with uncertainty about the behaviour of the 
> __atomics, people 
> will fall-back to the __sync functions simply because their expected 
> behaviour is 
> easier to work out.

I hadn't thought about that possible danger, but that would be right.
The way I would prefer to counter that is that we add a big fat warning
to the __sync built-ins that we don't have a precise specification for
them and that there are several corners of hand-waving and potentially
further issues, and that this is another reason to prefer the __atomic
built-ins.  PR 65697 etc. are enough indication for me that we indeed
lack a proper specification.

> I don't think that linking to external sites will help either, unless people 
> already 
> want to know C++11. Anybody who just wants to (e.g.) add a memory barrier 
> will take 
> one look at the __sync manual page and use the closest match from there 
> instead.

Well, "just wants to add a memory barrier" is a the start of the
problem.  The same way one needs to understand a hardware memory model
to pick the right HW instruction(s), the same one needs to understand a
programming language memory model to pick a fence and understand its
semantics.

> Note that none of this requires a tutorial of any kind. I'm just suggesting 
> that the 
> manual should describe what behaviour should be expected of the code 
> generated for 
> the functions. For the memory orders, that would mean describing what 
> constraints 
> need to be met by the generated code.

I'd bet that if one describes these constraints correctly, you'll get a
large document -- even if one removes any introductory or explanatory
parts that could make it a tutorial.

It's fairly straight-forward to describe several simple usage patterns
of the atomics (e.g., seq-cst ones, simple acquire/release pairs,
producer/consumer, etc.).  But describing the actual *constraints*
correctly ends up duplicating a specification.

You could certainly try to come up with a simple description of the
constraints, and we can iterate until I can't pick any holes in the
description anymore.  But I really don't think this would be a
worthwhile use of our time :)  It will certainly need more than a few
sentences to be bullet-proof.

If you haven't, please just look at
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/n4455.html
and try to specify the constraints just for the valid optimizations
described there.  This should be a good indication of why I think
specifying the reordering / behavior constraints is nontrivial.

> The requirement that the atomics should support 
> C++11 could be met by making sure that the description of the expected 
> behaviour is 
> sufficient for C++11.

We don't just want the semantics of __atomic* to be sufficient for C++11
but also want them to be *as weak as possible* to be still sufficient
for C++11 -- otherwise, we'll make C++11 code less efficient than it can
be.  Thus, we want semantics that have the same strength as what's
needed by C++11.

> > There are several resources for implementers, for example the mappings
> > maintained by the Cambridge research group.  I guess it would be
> > sufficient to have such material on the wiki.  Is there something
> > specific that you'd like to see documented for implementers?
> > [...]
> > I agree it's not described in the manual, but we're implementing C++11.
> 
> (As above) I believe we're supporting the implementation of C++11 and that 
> the 
> distinction is important.
> 
> > However, I don't see why happens-before semantics wouldn't apply to
> > GCC's implementation of the built-ins; there may be cases where we
> > guarantee more, but if one uses the builtins in way allowed by the C++11
> > model, one certainly gets behavior and happens-before relationships as
> > specified by C++11.
> >
> 
> My understanding is that happens-before is a relation used in the C++11 
> specification 
> for a specific meaning. I believe that it's used to decide whether something 
> is or is 
> not a data race

It appears in the data-race definition, right.  More generally, it is
the program-wide partial order regarding what virtually happens-before
what (as-if applies of cause) in a particular execution of a program.
It's at the core of actually describing how a multi-threaded program
behaves.

> so saying that it applies to a gcc built-in would be wrong.

Simplified, we can map 1:1 between an __atomic built-in and an
equivalent atomic operation.  The exception is basically the data
definitions for an atomic type (e.g., atomic<T>): While C++11 hides the
data and data type required for an atomically accessible variable, the
built-ins assume that the caller will target a suitable memory location.

> Using the 
> gcc built-in rather than the equivalent C++11 library function would result 
> in 
> program that C++11 regards as invalid. (Again, as I understand it.)

It wouldn't be invalid but simply not defined by C++11.  But that's fine
because the built-ins are a GCC-specific extension (which is compatible
with C++11 atomics, of course).

> >
> >>    @table  @code
> >>    @item __ATOMIC_RELAXED
> >> -No barriers or synchronization.
> >> +Implies no inter-thread ordering constraints.
> >>
> >> ====
> >> It may be useful to be explicit that there are no restrctions on code 
> >> motion.
> >> ====
> >
> > But there are restrictions, for example those related to the forward
> > progress requirements or the coherency rules.
> 
> Those are C++11 restrictions, used in the formal description of the model. 
> The 
> expected behaviour of the HW insructions generated by the built-in doesn't 
> require 
> any restrictions to be imposed.

No, those show up at the HW level as well.  Consider examples of
spin-loops with memory_order_relaxed loads, or the coherency rule.  For
example:
  foo.store(1, memory_order_relaxed);
  foo.store(2, memory_order_relaxed);
  r = foo.load(memory_order_relaxed);
In absence of other stores, r must never equal 1 (according to one of
the coherency rules).  If there'd be "no restrictions on code motion",
the load could be moved to before the stores, which isn't allowed.
Likewise, the assumption is that no hardware will do that either (or the
generated code has to enforce this through specific HW instructions).

> >> Here and elsewhere:
> >> "Can prevent <motion> of code" is ambiguous: it doesn't say under what
> >> conditions code would or wouldn't be prevented from moving.
> >
> > Yes, it's not a specification but just an illustration of what can
> > result from the specification.
> 
> But it makes it difficult to know what behaviour to expect, making it 
> difficult to use.
> 
> > [..] Describing which code movement is
> > allowed or not, precisely, is way too much detail IMO.  There are full
> > ISO C++ papers about that (N4455), and even those aren't enumerations of
> > all allowed code transformations.
> 
> A gcc built-in reduces to a code sequence to be executed by a single thread; 
> describing the expected behaviour of that code sequence shouldn't be so 
> difficult 
> that it needs a paper. Note that this doesn't need a formal (in the sense of 
> formal 
> methods) description, just the sort of description that is normal for a 
> compiler 
> built-in.

The built-ins represent constraints for the *both* generic code
transformations and arch-specific code generation.  They are not just
fancy ways to get certain instruction sequences.  Thus, what's discussed
in N4455 is very much relevant.

Re: [PATCH] Fix memory orders description in atomic ops built-ins docs.

Reply via email to