On Wed, 27 Mar 2024, Jakub Jelinek wrote: > On Wed, Mar 27, 2024 at 12:48:29PM +0100, Richard Biener wrote: > > > The following patch attempts to fix the (view_convert (convert@0 @1)) > > > optimization. If TREE_TYPE (@0) is a _BitInt type with padding bits > > > and @0 has the same precision as @1 and it has a different sign > > > and _BitInt with padding bits are extended on the target (x86 doesn't, > > > aarch64 doesn't, but arm plans to do that), then optimizing it to > > > just (view_convert @1) is wrong, the padding bits wouldn't be what > > > it should be. > > > E.g. bitint-64.c test with -O2 has > > > _5 = (unsigned _BitInt(5)) _4; > > > _7 = (unsigned _BitInt(5)) e.0_1; > > > _8 = _5 + _7; > > > _9 = (_BitInt(5)) _8; > > > _10 = VIEW_CONVERT_EXPR<unsigned char>(_9); > > > and forwprop1 changes that to just > > > _5 = (unsigned _BitInt(5)) _4; > > > _7 = (unsigned _BitInt(5)) e.0_1; > > > _8 = _5 + _7; > > > _10 = VIEW_CONVERT_EXPR<unsigned char>(_8); > > > The former makes the padding bits well defined (at least on arm in > > > the future), while the latter has those bits zero extended. > > > > So I think the existing rule, when extended to > > > > || (TYPE_PRECISION (TREE_TYPE (@0)) > TYPE_PRECISION (TREE_TYPE > > (@1)) > > && TYPE_UNSIGNED (TREE_TYPE (@1)) > > > > assumes padding is extended according to the signedness. But the > > original > > > > && (TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (TREE_TYPE > > (@1)) > > > > rule has the padding undefined. I think that's inconsistent at least. > > The whole simplification is just weird, all it tests are the precisions > of TREE_TYPE (@0) and TREE_TYPE (@1), but doesn't actually test if there > are padding bits (i.e. the TYPE_SIZE (type) vs. those precisions). > > I've tried to construct a non-_BitInt testcase with > struct S { signed long long a : 37; unsigned long long b : 37; } s; > > unsigned long long > foo (long long x, long long y) > { > __typeof (s.a + 0) c = x, d = y; > __typeof (s.b + 0) e = (__typeof (s.b + 0)) (c + d); > union U { __typeof (s.b + 0) f; unsigned long long g; } u; > u.f = e; > return u.g; > } > > unsigned long long > bar (unsigned long long x, unsigned long long y) > { > __typeof (s.b + 0) c = x, d = y; > __typeof (s.a + 0) e = (__typeof (s.a + 0)) (c + d); > union U { __typeof (s.a + 0) f; unsigned long long g; } u; > u.f = e; > return u.g; > } > > int > main () > { > unsigned long long x = foo (-53245722422LL, -5719971797LL); > unsigned long long y = bar (136917222900, 5719971797LL); > __builtin_printf ("%016llx %016llx\n", x, y); > return 0; > } > which seems to print > 00000012455ee8f5 0000000135d6ddc9 > at -O0 and > fffffff2455ee8f5 0000000135d6ddc9 > at -O2/-O3, dunno if in that case we consider those padding bits > uninitialized/anything can happen, then it is fine, or something else. > > > I'll note that we do not constrain 'type' so the V_C_E could > > re-interpret the integer (with or without padding) as floating-point. > > Anyway, if we have the partial int types with undefined padding bits > (the above __typeof of a bitfield type or _BitInt on x86/aarch64), > I guess the precision0 == precision1 case is ok, we have > say 3 padding bits before/after, a valid program better should mask > away those bits or it will be UB. > If precision0 == precision1 and type1 has the extended padding bits > and type0 has undefined padding bits, then I guess it is also ok. > Other cases of same precision are not ok, turning well defined bits > into undefined or changing sign extended to zero extended or vice versa. > > For the precision0 > precision1 && unsigned1 case on the other side, > I don't see how it is ever correct for the undefined padding bits cases, > the padding bits above type0 are undefined before/after the simplification, > but the bits in between are changed from previously zero to undefined. > > For the precision0 > precision1 cases if both type0 and type1 are > extended like _BitInt on arm, then the simplification is reasonable, > all the padding bits above type0 are zeros and all the padding bits between > type1 and type0 precision are zeros too, before and after. > > So, indeed my patch isn't correct. > > And because we don't have the _BitInt arm support for GCC 14, perhaps > we could defer that part for GCC 15, though I wonder if we shouldn't just > kill that TYPE_PRECISION (TREE_TYPE (@0)) > TYPE_PRECISION (TREE_TYPE (@1)) > && TYPE_UNSIGNED (TREE_TYPE (@1)) case, because it is clearly wrong > for the types with undefined padding bits (everything on trunk right now).
The comment says it was added for (char)_Bool, probably gcc.dg/tree-ssa/vce-1.c will fail. But I'm not sure this optimization is important, it seems early SRA introduces a V_C_E here and then phiopt the conversion to unsigned char. But this is a V_C_E of _Bool -> unsigned char -> _Bool. We seem to apply both this pattern and then drop the V_C_E. Maybe we can reasonably optimize the case where type == TREE_TYPE (@1) and the intermediate precision is bigger or equal? Richard. > If I have > struct S { signed long long a : 42; unsigned long long b : 37; } s; > __typeof (s.b) v1 = ...; > __typeof (s.a) v2; > unsigned long long v3; > v2 = v1; // This zero extends the 5 bits > v3 = VCE <unsigned long long> (v2); > return v3 & 0x3ffffffffffULL; > then without the folding it used to be valid, those 5 bits were zeros, > with the optimization it is not. Similarly to VCE to struct or double > and the like. And similarly for unsigned _BitInt(37)/ signed _BitInt(42) > on x86/aarch64. > > Jakub > > -- Richard Biener <rguent...@suse.de> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)