On Thu, 2015-05-21 at 16:45 +0100, Matthew Wahab wrote: > On 19/05/15 20:20, Torvald Riegel wrote: > > On Mon, 2015-05-18 at 17:36 +0100, Matthew Wahab wrote: > >> Hello, > >> > >> On 15/05/15 17:22, Torvald Riegel wrote: > >>> This patch improves the documentation of the built-ins for atomic > >>> operations. > >> > >> The "memory model" to "memory order" change does improve things but I > >> think that > >> the patch has some problems. As it is now, it makes some of the > >> descriptions > >> quite difficult to understand and seems to assume more familiarity with > >> details > >> of the C++11 specification then might be expected. > > > > I'd say that's a side effect of the C++11 memory model being the > > reference specification of the built-ins. > > > >> Generally, the memory order descriptions seem to be targeted towards > >> language > >> designers but don't provide for anybody trying to understand how to > >> implement or > >> to use the built-ins. > > > > I agree that the current descriptions aren't a tutorial on the C++11 > > memory model. However, given that the model is not GCC-specific, we > > aren't really in a need to provide a tutorial, in the same way that we > > don't provide a C++ tutorial. Users can pick the C++11 memory model > > educational material of their choice, and we need to document what's > > missing to apply the C++11 knowledge to the built-ins we provide. > > > > We seem to have different views about the purpose of the manual page. I'm > treating it > as a description of the built-in functions provided by gcc to generate the > code > needed to implement the C++11 model. That is, the built-ins are distinct from > C++11 > and their descriptions should be, as far as possible, independent of the > methods used > in the C++11 specification to describe the C++11 memory model.
OK. But we'd need a *precise* specification of what they do if we'd want to make them separate from the C++11 memory model. And we don't have that, would you agree? It's also not a trivial task, so I wouldn't be optimistic that someone would offer to write such a specification, and have it cross-checked. > I understand of course that the __atomics were added in order to support > C++11 but > that doesn't make them part of C++11 and, since __atomic functions can be > made > available when C11/C++11 may not be, it seems to make sense to try for > stand-alone > descriptions. The compiler can very well provide the C++11 *memory model* without creating any dependency on the other language/library pieces of C++11 or C11. Prior to C++11, multi-threaded executions were not defined by the standard, so we're not conflicting with anything in prior language standards, right? Another way to see this is to say that we just *copy* the C++11 memory model and use it as the memory model that specifies the behavior of the atomic built-ins. That additionally frees us from having to come up with and maintain our GCC-specific specification of atomics and a memory model. > I'm also concerned that the patch, by describing things in terms of formal > C++11 > concepts, makes it more difficult for people to know what the built-ins can > be > expected to do and so make the built-in more difficult to use There is a > danger that > rather than take a risk with uncertainty about the behaviour of the > __atomics, people > will fall-back to the __sync functions simply because their expected > behaviour is > easier to work out. I hadn't thought about that possible danger, but that would be right. The way I would prefer to counter that is that we add a big fat warning to the __sync built-ins that we don't have a precise specification for them and that there are several corners of hand-waving and potentially further issues, and that this is another reason to prefer the __atomic built-ins. PR 65697 etc. are enough indication for me that we indeed lack a proper specification. > I don't think that linking to external sites will help either, unless people > already > want to know C++11. Anybody who just wants to (e.g.) add a memory barrier > will take > one look at the __sync manual page and use the closest match from there > instead. Well, "just wants to add a memory barrier" is a the start of the problem. The same way one needs to understand a hardware memory model to pick the right HW instruction(s), the same one needs to understand a programming language memory model to pick a fence and understand its semantics. > Note that none of this requires a tutorial of any kind. I'm just suggesting > that the > manual should describe what behaviour should be expected of the code > generated for > the functions. For the memory orders, that would mean describing what > constraints > need to be met by the generated code. I'd bet that if one describes these constraints correctly, you'll get a large document -- even if one removes any introductory or explanatory parts that could make it a tutorial. It's fairly straight-forward to describe several simple usage patterns of the atomics (e.g., seq-cst ones, simple acquire/release pairs, producer/consumer, etc.). But describing the actual *constraints* correctly ends up duplicating a specification. You could certainly try to come up with a simple description of the constraints, and we can iterate until I can't pick any holes in the description anymore. But I really don't think this would be a worthwhile use of our time :) It will certainly need more than a few sentences to be bullet-proof. If you haven't, please just look at http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/n4455.html and try to specify the constraints just for the valid optimizations described there. This should be a good indication of why I think specifying the reordering / behavior constraints is nontrivial. > The requirement that the atomics should support > C++11 could be met by making sure that the description of the expected > behaviour is > sufficient for C++11. We don't just want the semantics of __atomic* to be sufficient for C++11 but also want them to be *as weak as possible* to be still sufficient for C++11 -- otherwise, we'll make C++11 code less efficient than it can be. Thus, we want semantics that have the same strength as what's needed by C++11. > > There are several resources for implementers, for example the mappings > > maintained by the Cambridge research group. I guess it would be > > sufficient to have such material on the wiki. Is there something > > specific that you'd like to see documented for implementers? > > [...] > > I agree it's not described in the manual, but we're implementing C++11. > > (As above) I believe we're supporting the implementation of C++11 and that > the > distinction is important. > > > However, I don't see why happens-before semantics wouldn't apply to > > GCC's implementation of the built-ins; there may be cases where we > > guarantee more, but if one uses the builtins in way allowed by the C++11 > > model, one certainly gets behavior and happens-before relationships as > > specified by C++11. > > > > My understanding is that happens-before is a relation used in the C++11 > specification > for a specific meaning. I believe that it's used to decide whether something > is or is > not a data race It appears in the data-race definition, right. More generally, it is the program-wide partial order regarding what virtually happens-before what (as-if applies of cause) in a particular execution of a program. It's at the core of actually describing how a multi-threaded program behaves. > so saying that it applies to a gcc built-in would be wrong. Simplified, we can map 1:1 between an __atomic built-in and an equivalent atomic operation. The exception is basically the data definitions for an atomic type (e.g., atomic<T>): While C++11 hides the data and data type required for an atomically accessible variable, the built-ins assume that the caller will target a suitable memory location. > Using the > gcc built-in rather than the equivalent C++11 library function would result > in > program that C++11 regards as invalid. (Again, as I understand it.) It wouldn't be invalid but simply not defined by C++11. But that's fine because the built-ins are a GCC-specific extension (which is compatible with C++11 atomics, of course). > > > >> @table @code > >> @item __ATOMIC_RELAXED > >> -No barriers or synchronization. > >> +Implies no inter-thread ordering constraints. > >> > >> ==== > >> It may be useful to be explicit that there are no restrctions on code > >> motion. > >> ==== > > > > But there are restrictions, for example those related to the forward > > progress requirements or the coherency rules. > > Those are C++11 restrictions, used in the formal description of the model. > The > expected behaviour of the HW insructions generated by the built-in doesn't > require > any restrictions to be imposed. No, those show up at the HW level as well. Consider examples of spin-loops with memory_order_relaxed loads, or the coherency rule. For example: foo.store(1, memory_order_relaxed); foo.store(2, memory_order_relaxed); r = foo.load(memory_order_relaxed); In absence of other stores, r must never equal 1 (according to one of the coherency rules). If there'd be "no restrictions on code motion", the load could be moved to before the stores, which isn't allowed. Likewise, the assumption is that no hardware will do that either (or the generated code has to enforce this through specific HW instructions). > >> Here and elsewhere: > >> "Can prevent <motion> of code" is ambiguous: it doesn't say under what > >> conditions code would or wouldn't be prevented from moving. > > > > Yes, it's not a specification but just an illustration of what can > > result from the specification. > > But it makes it difficult to know what behaviour to expect, making it > difficult to use. > > > [..] Describing which code movement is > > allowed or not, precisely, is way too much detail IMO. There are full > > ISO C++ papers about that (N4455), and even those aren't enumerations of > > all allowed code transformations. > > A gcc built-in reduces to a code sequence to be executed by a single thread; > describing the expected behaviour of that code sequence shouldn't be so > difficult > that it needs a paper. Note that this doesn't need a formal (in the sense of > formal > methods) description, just the sort of description that is normal for a > compiler > built-in. The built-ins represent constraints for the *both* generic code transformations and arch-specific code generation. They are not just fancy ways to get certain instruction sequences. Thus, what's discussed in N4455 is very much relevant.