On 11/11/20 2:55 AM, Jakub Jelinek wrote:
> On Wed, Nov 11, 2020 at 09:33:00AM +0100, Stefan Kanthak wrote:
>> Ouch: that's but not the point here; what matters is the undefined behaviour 
>> of
>>       ((u) & 0x000000ff) << 24
>>
>> 0x000000ff is a signed int, so (u) & 0x000000ff is signed too -- and 
>> producing
>> a negative value (or overflow) from the left-shift of a signed int, i.e.
>> shifting into (or beyond) the sign bit, is undefined behaviour!
> Only in some language dialects.
> It is caught by -fsanitize=shift.
> In C++20, if the shift count is within bounds, all signed as well as
> unsigned left shifts well defined.
> In C99/C11 there is one extra rule:
> For signed x << y, in C99/C11, the following:
>      (unsigned) x >> (uprecm1 - y)
>      if non-zero, is undefined.
> and for C++11 to C++17 another one:
>   /* For signed x << y, in C++11 and later, the following:
>      x < 0 || ((unsigned) x >> (uprecm1 - y)) > 1
>      is undefined.  */
> So indeed, 0x80 << 24 is UB in C99/C11 and C++98, unclear in C89 and
> well defined in C++11 and later.  I don't know if C2X is considering
> mandating two's complement and making it well defined like C++20 did.
>
> Guess we should fix that, though because different languages have different
> rules, GCC itself except for sanitization doesn't consider it UB and only
> treats shifts by negative value or shifts by bitsize or more UB.
Even if it's well defined by C++20, I don't think we can rely on those
semantics within libgcc2.  At best we might be able to claim C99 and we
might even be stuck at C89, regardless of how GCC treats a shift into
the sign bit.

I'll do a bit of testing here, but I'm inclined to explicitly treat all
those constants as unsigned for the sake of consistency.  Thanks Stefan
for pointing out what I missed (shift into the sign bit).

jeff

Reply via email to