On Wed, 27 Mar 2024, Jakub Jelinek wrote:

> On Wed, Mar 27, 2024 at 12:48:29PM +0100, Richard Biener wrote:
> > > The following patch attempts to fix the (view_convert (convert@0 @1))
> > > optimization.  If TREE_TYPE (@0) is a _BitInt type with padding bits
> > > and @0 has the same precision as @1 and it has a different sign
> > > and _BitInt with padding bits are extended on the target (x86 doesn't,
> > > aarch64 doesn't, but arm plans to do that), then optimizing it to
> > > just (view_convert @1) is wrong, the padding bits wouldn't be what
> > > it should be.
> > > E.g. bitint-64.c test with -O2 has
> > >   _5 = (unsigned _BitInt(5)) _4;
> > >   _7 = (unsigned _BitInt(5)) e.0_1;
> > >   _8 = _5 + _7;
> > >   _9 = (_BitInt(5)) _8;
> > >   _10 = VIEW_CONVERT_EXPR<unsigned char>(_9);
> > > and forwprop1 changes that to just
> > >   _5 = (unsigned _BitInt(5)) _4;
> > >   _7 = (unsigned _BitInt(5)) e.0_1;
> > >   _8 = _5 + _7;
> > >   _10 = VIEW_CONVERT_EXPR<unsigned char>(_8);
> > > The former makes the padding bits well defined (at least on arm in
> > > the future), while the latter has those bits zero extended.
> > 
> > So I think the existing rule, when extended to
> > 
> >            || (TYPE_PRECISION (TREE_TYPE (@0)) > TYPE_PRECISION (TREE_TYPE 
> > (@1))
> >                && TYPE_UNSIGNED (TREE_TYPE (@1))
> > 
> > assumes padding is extended according to the signedness.  But the
> > original
> > 
> >        && (TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (TREE_TYPE 
> > (@1))
> > 
> > rule has the padding undefined.  I think that's inconsistent at least.
> 
> The whole simplification is just weird, all it tests are the precisions
> of TREE_TYPE (@0) and TREE_TYPE (@1), but doesn't actually test if there
> are padding bits (i.e. the TYPE_SIZE (type) vs. those precisions).
> 
> I've tried to construct a non-_BitInt testcase with
> struct S { signed long long a : 37; unsigned long long b : 37; } s;
> 
> unsigned long long
> foo (long long x, long long y)
> {
>   __typeof (s.a + 0) c = x, d = y;
>   __typeof (s.b + 0) e = (__typeof (s.b + 0)) (c + d);
>   union U { __typeof (s.b + 0) f; unsigned long long g; } u;
>   u.f = e;
>   return u.g;
> }
> 
> unsigned long long
> bar (unsigned long long x, unsigned long long y)
> {
>   __typeof (s.b + 0) c = x, d = y;
>   __typeof (s.a + 0) e = (__typeof (s.a + 0)) (c + d);
>   union U { __typeof (s.a + 0) f; unsigned long long g; } u;
>   u.f = e;
>   return u.g;
> }
> 
> int
> main ()
> {
>   unsigned long long x = foo (-53245722422LL, -5719971797LL);
>   unsigned long long y = bar (136917222900, 5719971797LL);
>   __builtin_printf ("%016llx %016llx\n", x, y);
>   return 0;
> }
> which seems to print
> 00000012455ee8f5 0000000135d6ddc9
> at -O0 and
> fffffff2455ee8f5 0000000135d6ddc9
> at -O2/-O3, dunno if in that case we consider those padding bits
> uninitialized/anything can happen, then it is fine, or something else.
> 
> > I'll note that we do not constrain 'type' so the V_C_E could
> > re-interpret the integer (with or without padding) as floating-point.
> 
> Anyway, if we have the partial int types with undefined padding bits
> (the above __typeof of a bitfield type or _BitInt on x86/aarch64),
> I guess the precision0 == precision1 case is ok, we have
> say 3 padding bits before/after, a valid program better should mask
> away those bits or it will be UB.
> If precision0 == precision1 and type1 has the extended padding bits
> and type0 has undefined padding bits, then I guess it is also ok.
> Other cases of same precision are not ok, turning well defined bits
> into undefined or changing sign extended to zero extended or vice versa.
> 
> For the precision0 > precision1 && unsigned1 case on the other side,
> I don't see how it is ever correct for the undefined padding bits cases,
> the padding bits above type0 are undefined before/after the simplification,
> but the bits in between are changed from previously zero to undefined.
> 
> For the precision0 > precision1 cases if both type0 and type1 are
> extended like _BitInt on arm, then the simplification is reasonable,
> all the padding bits above type0 are zeros and all the padding bits between
> type1 and type0 precision are zeros too, before and after.
> 
> So, indeed my patch isn't correct.
> 
> And because we don't have the _BitInt arm support for GCC 14, perhaps
> we could defer that part for GCC 15, though I wonder if we shouldn't just
> kill that TYPE_PRECISION (TREE_TYPE (@0)) > TYPE_PRECISION (TREE_TYPE (@1))
> && TYPE_UNSIGNED (TREE_TYPE (@1)) case, because it is clearly wrong
> for the types with undefined padding bits (everything on trunk right now).

The comment says it was added for (char)_Bool, probably 
gcc.dg/tree-ssa/vce-1.c will fail.  But I'm not sure this optimization
is important, it seems early SRA introduces a V_C_E here and then
phiopt the conversion to unsigned char.  But this is a V_C_E of
_Bool -> unsigned char -> _Bool.  We seem to apply both this
pattern and then drop the V_C_E.

Maybe we can reasonably optimize the case where type == TREE_TYPE (@1)
and the intermediate precision is bigger or equal?

Richard.


> If I have
> struct S { signed long long a : 42; unsigned long long b : 37; } s;
> __typeof (s.b) v1 = ...;
> __typeof (s.a) v2;
> unsigned long long v3;
> v2 = v1; // This zero extends the 5 bits
> v3 = VCE <unsigned long long> (v2);
> return v3 & 0x3ffffffffffULL;
> then without the folding it used to be valid, those 5 bits were zeros,
> with the optimization it is not.  Similarly to VCE to struct or double
> and the like.  And similarly for unsigned _BitInt(37)/ signed _BitInt(42)
> on x86/aarch64.
> 
>       Jakub
> 
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Reply via email to