On Wed, Mar 27, 2024 at 12:48:29PM +0100, Richard Biener wrote:
> > The following patch attempts to fix the (view_convert (convert@0 @1))
> > optimization.  If TREE_TYPE (@0) is a _BitInt type with padding bits
> > and @0 has the same precision as @1 and it has a different sign
> > and _BitInt with padding bits are extended on the target (x86 doesn't,
> > aarch64 doesn't, but arm plans to do that), then optimizing it to
> > just (view_convert @1) is wrong, the padding bits wouldn't be what
> > it should be.
> > E.g. bitint-64.c test with -O2 has
> >   _5 = (unsigned _BitInt(5)) _4;
> >   _7 = (unsigned _BitInt(5)) e.0_1;
> >   _8 = _5 + _7;
> >   _9 = (_BitInt(5)) _8;
> >   _10 = VIEW_CONVERT_EXPR<unsigned char>(_9);
> > and forwprop1 changes that to just
> >   _5 = (unsigned _BitInt(5)) _4;
> >   _7 = (unsigned _BitInt(5)) e.0_1;
> >   _8 = _5 + _7;
> >   _10 = VIEW_CONVERT_EXPR<unsigned char>(_8);
> > The former makes the padding bits well defined (at least on arm in
> > the future), while the latter has those bits zero extended.
> 
> So I think the existing rule, when extended to
> 
>            || (TYPE_PRECISION (TREE_TYPE (@0)) > TYPE_PRECISION (TREE_TYPE 
> (@1))
>                && TYPE_UNSIGNED (TREE_TYPE (@1))
> 
> assumes padding is extended according to the signedness.  But the
> original
> 
>        && (TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (TREE_TYPE 
> (@1))
> 
> rule has the padding undefined.  I think that's inconsistent at least.

The whole simplification is just weird, all it tests are the precisions
of TREE_TYPE (@0) and TREE_TYPE (@1), but doesn't actually test if there
are padding bits (i.e. the TYPE_SIZE (type) vs. those precisions).

I've tried to construct a non-_BitInt testcase with
struct S { signed long long a : 37; unsigned long long b : 37; } s;

unsigned long long
foo (long long x, long long y)
{
  __typeof (s.a + 0) c = x, d = y;
  __typeof (s.b + 0) e = (__typeof (s.b + 0)) (c + d);
  union U { __typeof (s.b + 0) f; unsigned long long g; } u;
  u.f = e;
  return u.g;
}

unsigned long long
bar (unsigned long long x, unsigned long long y)
{
  __typeof (s.b + 0) c = x, d = y;
  __typeof (s.a + 0) e = (__typeof (s.a + 0)) (c + d);
  union U { __typeof (s.a + 0) f; unsigned long long g; } u;
  u.f = e;
  return u.g;
}

int
main ()
{
  unsigned long long x = foo (-53245722422LL, -5719971797LL);
  unsigned long long y = bar (136917222900, 5719971797LL);
  __builtin_printf ("%016llx %016llx\n", x, y);
  return 0;
}
which seems to print
00000012455ee8f5 0000000135d6ddc9
at -O0 and
fffffff2455ee8f5 0000000135d6ddc9
at -O2/-O3, dunno if in that case we consider those padding bits
uninitialized/anything can happen, then it is fine, or something else.

> I'll note that we do not constrain 'type' so the V_C_E could
> re-interpret the integer (with or without padding) as floating-point.

Anyway, if we have the partial int types with undefined padding bits
(the above __typeof of a bitfield type or _BitInt on x86/aarch64),
I guess the precision0 == precision1 case is ok, we have
say 3 padding bits before/after, a valid program better should mask
away those bits or it will be UB.
If precision0 == precision1 and type1 has the extended padding bits
and type0 has undefined padding bits, then I guess it is also ok.
Other cases of same precision are not ok, turning well defined bits
into undefined or changing sign extended to zero extended or vice versa.

For the precision0 > precision1 && unsigned1 case on the other side,
I don't see how it is ever correct for the undefined padding bits cases,
the padding bits above type0 are undefined before/after the simplification,
but the bits in between are changed from previously zero to undefined.

For the precision0 > precision1 cases if both type0 and type1 are
extended like _BitInt on arm, then the simplification is reasonable,
all the padding bits above type0 are zeros and all the padding bits between
type1 and type0 precision are zeros too, before and after.

So, indeed my patch isn't correct.

And because we don't have the _BitInt arm support for GCC 14, perhaps
we could defer that part for GCC 15, though I wonder if we shouldn't just
kill that TYPE_PRECISION (TREE_TYPE (@0)) > TYPE_PRECISION (TREE_TYPE (@1))
&& TYPE_UNSIGNED (TREE_TYPE (@1)) case, because it is clearly wrong
for the types with undefined padding bits (everything on trunk right now).

If I have
struct S { signed long long a : 42; unsigned long long b : 37; } s;
__typeof (s.b) v1 = ...;
__typeof (s.a) v2;
unsigned long long v3;
v2 = v1; // This zero extends the 5 bits
v3 = VCE <unsigned long long> (v2);
return v3 & 0x3ffffffffffULL;
then without the folding it used to be valid, those 5 bits were zeros,
with the optimization it is not.  Similarly to VCE to struct or double
and the like.  And similarly for unsigned _BitInt(37)/ signed _BitInt(42)
on x86/aarch64.

        Jakub

Reply via email to