Re: _BitInt vs. _Atomic

Martin Uecker Tue, 01 Aug 2023 12:04:13 -0700

Am Dienstag, dem 01.08.2023 um 15:54 +0000 schrieb Michael Matz:
> Hello,
> 
> On Mon, 31 Jul 2023, Martin Uecker wrote:
> 
> > >  Say you have a loop like so:
> > > 
> > > _Atomic T obj;
> > > ...
> > > T expected1, expected2, newval;
> > > newval = ...;
> > > expected1 = ...;
> > > do {
> > >   expected2 = expected1;
> > >   if (atomic_compare_exchange_weak(&obj, &expected2, newval);
> > >     break;
> > >   expected1 = expected2;
> > > } while (1);
> > > 
> > > As written this looks of course stupid, and you may say "don't do that", 
> > > but internally the copies might result from temporaries (compiler 
> > > generated or wrapper function arguments, or suchlike). 
> > >  Now, while 
> > > expected2 will contain the copied padding bits after the cmpxchg the 
> > > copies to and from expected1 will possibly destroy them.  Either way I 
> > > don't see why the above loop should be out-of-spec, so I can write it and 
> > > expect it to proceed eventually (certainly when the _strong variant is 
> > > used).  Any argument that would declare the above loop out-of-spec I 
> > > would 
> > > consider a defect in the spec.
> > 
> > It is "out-of-spec" for C in the sense that it can not be
> > expected work with the semantics as specified in the C standard.
> 
> (I call that a defect.  See below)


This was extensively discussed in WG14 (before my time). In fact,
there was a defect report about the previous version defined in
terms of values and the wording was changed to memcmp / memcpy
operating on padding bytes (also to align with C++ at that time):

https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2059.htm#dr_431
https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1906.htm


> > In practice, what the semantics specified using memcpy/memcmp
> > allow one to do is to also apply atomic operations on non-atomic 
> > types.  This is not guaranteed to work by the C standard, but
> > in practice  people often have to do this.  For example, nobody
> > is going to copy a 256 GB numerical array with non-atomic types
> > into another data structure with atomic versions of the same
> > type just so that you can apply atomic operations on it.
> > So one simply does an unsafe cast and hopes the compiler does
> > not break this.
> > 
> > If the non-atomic struct now has non-zero values in the padding, 
> > and the compiler would clear those automatically for "expected", 
> > you would create the problem of an infinite loop (this time 
> > for real).
> 
> Only because cmpxchg is defined in terms of memcpy/memcmp. 

Yes, but this is intentional.

>  If it were 
> defined in terms of the == operator (obviously applied recursively 
> member-wise for structs) and simple-assignment that wouldn't be a problem. 

C has no == operator or any concept of struct equality. 

It would also cause implementation overhead and I guess
could cause severe performance issues when there several
padding bytes distributed over an object and you need to
jump over those when doing copying or doing comparisons.
(how do vectorization?)

> In addition that would get rid of all discussion of what happens or 
> doesn't happen with padding.  Introducing reliance on padding bits (which 
> IMHO goes against some fundamental ideas of the C standard) has 
> far-reaching consequences, see below. 

Working with representation bytes of objects is a rather 
fundamental property of C. That you can do this using
character pointers or that you can copy objects with
memcpy and that the result are compared with memcmp is
something I expect to work in C. 

>  The current definition of the 
> atomic_cmpxchg is also inconsistent with the rest of the standard:
> 
> We have:
> 
>   ... (C is non-atomic variant of A) ...
>   _Bool atomic_compare_exchange_strong(volatile A *object,
>                                        C *expected, C desired);
>   ... (is equivalent to atomic variant of:) 
>   if (memcmp(object, expected, sizeof (*object)) == 0)
>     { memcpy(object, &desired, sizeof (*object)); return true; }
>   else
>     { memcpy(expected, object, sizeof (*object)); return false; }
> 
> But we also have:
> 
>   The size, representation, and alignment of an atomic type need not be 
>   the same as those of the corresponding unqualified type.
> 
>   (with later text only suggesting that at least for atomic integer 
>   types these please be the same.  But here we aren't talking about
>   integer types even.)

Reading the old meeting minutes, it seems WG14 considered
the case that an atomic type could have a content part and
possibly a lock and you would compare only the content
part (with padding) and not the lock. But I agree, the
wording should be improved.


> 
> So, already the 'memcmp(object, expected, sizeof (*object)' may be 
> undefined.  sizeof(*object) need not be the same as sizeof(*expected).
> In particular the memcpy in the else branch might clobber memory outside 
> *expected.
> 
> That alone should be sufficient to show that defining this all in terms of 
> memcpy/memcmp is a bad idea.

I don't agree. But I agree the wording is not clear enough.


>   But it also has other 
> consequences: you can't copy (simple-assign) or compare (== operator) 
> atomic values anymore reliably and expect the atomic_cmpxchg to work.  My 
> example from earlier shows that you can't copy them, a similar one can be 
> constructed for breaking ==.

I am not sure I fully understand this.  

If you are not touching "expected" atomic_cmpxchg works as
intended. 

> But it goes further: you can also construct an example that shows an 
> internal inconsistency just with using atomic_cmpxchg (of course, assume 
> all this to run without concurrent accesses to the respective objects):
> 
>   _Atomic T obj;
>   ...
>   T expected, newval;
>   expected = ...;
>   newval = expected + 1;         // just to make it different
>   atomic_store (&obj, expected);
>   if (atomic_cmpxchg_strong (&obj, &expected, newval)) {
>     /* Now we have: obj == newval.
>        Do we also have memcmp(&obj,&newval)==0? */

There is not requirement for memcpy(...) == 0 because
the "newval" is passed in as a value. The standard also
does not talk about the "content of memory" of newval
while it does so for "obj" and "expected".

>     if (!atomic_cmpxchg_strong (&obj, &newval, expected)) {
>       /* No, we can't rely on that!  */
>       error("what's going on?");

Right, you can not rely on it. I personally do not find 
this surprising given the specification in C, but it
is certainly something one has to be aware of. 

But in a multi-threaded program, you could also not rely
on it, so I am not sure this is an issue.

>     }
>   } else {
>     /* May happen, padding of expected may not be the same
>        as in obj, even after atomic_store.  */
>     error("WTH? a compare after a store doesn't even work?");
>   }
> 
> So, even though cmpxchg is defined in terms of memcpy/memcmp, we still 
> can't rely on anything after it succeeded (or failed).  Simply because the 
> by-value passing of the 'desired' argument will have unknown padding 
> (within the implementation of cmpxchg) that isn't necessarily the same as 
> the newval object.

You can rely on expected being updated with the
representation of the "obj". So in a loop you can
rely on a successful store in the next iteration.
Also after a successful store the value stored in obj
and visible to other threads is the one in "desired".
This seems to cover the relevant use cases of cmpxchg.

> 
> Now, about your suggestion of clearing or ignoring the padding bits at 
> specific points:  Clearly with current standard language (being explicit 
> about memcpy/memcmp and also within the text refering to 'contents of the 
> memory pointed to by ...', unlike earlier versions that at least still 
> talked about 'value of') padding bits cannot be ignored at the compare 
> within cmpxchg.

Yes.

>   Neither can anything be cleared from within cmpxchg (and 
> that's even ignoring that *expected and *object might have completely 
> different representations, as per above).

cmpxchg can clear the padding of "desired" before copying
it into "obj"

> 
> So, one idea was:
> 
> > A compiler could, for example, always clear the padding when
> > initializing or storing atomic values.
> 
> But that doesn't help the memcmp: even if *object has 
> cleared padding, *expected might not have (it's not of atomic type).  You 
> explicitely ruled out ignoring the padding on *expected for the memcmp due 
> to the above large-array example.  

Yes, an initial "expected" is not guaranteed to have it, 
but if you read it via atomic_load or atomic_cmpxchg
it would have zero padding because all previous stores
to atomic types would have zero padding.

(One could also use memset to set the padding to zero
initially, if one really cares about spurious cycle).

> To that end you suggested:
> 
> > It might also clear the padding of the initial "expected", when it is 
> > initialized or stored to.
> 
> But that amounts to magic, I don't see a way to do that: as far as the 
> compiler is concerned it's just a random object of an arbitrary 
> (non-atomic!) type coming from somewhere (that it's an argument to cmpxchg 
> eventually might not be visible).  It would have to clear _all_ padding 
> for all objects always, just because of the chance that it may eventually 
> be passed to atomic_cmpxchg.  That quite clearly can't be the intention.

No, that wasn't may intention. But if it is a local variable
that is then passed to cmpxchg a compiler could clear the initial
padding as an optimization to prevent spurious cycles.

> 
> To be honest the whole passage that uses memcmp/memcpy within the 
> definition of atomic_compare_exchange since C11 reads like a try to 
> explain the semantics in simple terms, but failing to account for padding.  
> Initially that wasn't such a problem because the normative text still 
> said
>
>    Atomically, compares the value pointed to by object for equality 
>    with that in expected, and if true, replaces the value pointed to by 
>    object with desired, and if false, updates the value in expected
>    with the value pointed to by object.
> 
> Note: "value" not "value representation" or "memory pointed to".  It seems 
> eventually people noticed an inconsistency between the memcpy/memcpy Note 
> and the above text, and overcorrected this to also talk about "memory 
> pointed to".  But that then elevates padding to something observable, and 
> I think that is wrong and was done without sufficiently catering for the 
> consequences.

No, the new text was originally taking from C++ after discussing
this specific issue and then kept intentionally. The document
making this change is:

https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1906.htm

Later WG21 then changed their opinion again, apparently after
people  forgot how  it was supposed to work (c.f the incorrect
example  in the WG21 document).

That people are misled by this, may  be a good reason
to change semantics, but I have to say that the possibility to
using atomic operations on existing types is far more important
in practice. (C++ has an ugly workaround for this).


Martin

Re: _BitInt vs. _Atomic

Reply via email to