https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115161
Jakub Jelinek <jakub at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |jakub at gcc dot gnu.org --- Comment #4 from Jakub Jelinek <jakub at gcc dot gnu.org> --- (In reply to Sergei Trofimovich from comment #3) > Looking at -O2's bug.cc.265t.optimized tree optimizations come up with > unfolded saturated sub8: > > _12 = __builtin_ia32_psubusb128 ({ -65, 0, 0, 0, -65, 0, 0, 0, -65, 0, 0, > 0, -65, 0, 0, 0 }, { -99, 0, 0, 0, -99, 0, 0, 0, -99, 0, 0, 0, -99, 0, 0, 0 > }); > _13 = __builtin_ia32_pminub128 (_12, { 32, 0, 0, 0, 32, 0, 0, 0, 32, 0, 0, > 0, 32, 0, 0, 0 }); > ... > > > bug.cc.272r.cse1 still has that subtraction: > > 5: r119:V16QI=[`*.LC0'] > REG_EQUAL const_vector > 6: r120:V16QI=[`*.LC1'] > REG_EQUAL const_vector > 7: r118:V16QI=us_minus(r119:V16QI,r120:V16QI) > > bug.cc.273r.fwprop1 does not anymore: > > 3: NOTE_INSN_BASIC_BLOCK 2 > 2: NOTE_INSN_FUNCTION_BEG > 9: r122:V16QI=[`*.LC2'] > REG_EQUAL const_vector > 13: r123:V4SI=r122:V16QI#0<<0x17 > REG_EQUAL const_vector > 16: r128:SI=0x5f800000 > 15: r127:V4SI=vec_duplicate(r128:SI) > > Could it be that constant folder "forgot" to generate anything for > unsupported saturated-sub instead of leaving it as is? No. It is normal constant folding on RTL (not done on GIMPLE because the i386 backend doesn't try to gimple fold __builtin_ia32_psubusb128 or __builtin_ia32_psubusb128). 0xbf - 0x9d is 0x22, so the us_minus works actually in this case exactly like minus and because 0x20 is smaller than that, the minimum is a vector with 0x20 elements (plus min (0 - 0, 0) = 0 elements). The reason the testcase FAILs is the same as in the other PRs, it is trying to convert {0x0.8p+33f, 0x0.8p+33f, 0x0.8p+33f, 0x0.8p+33f} V4SFmode vector to V4SImode, and because the backend sees the constant operand of the fix, it folds it to the unspecified value as with scalar conversion. Consider: int main () { volatile float f = 0x0.8p+33f; volatile float __attribute__((vector_size (16))) vf = { 0x0.8p+33f, 0x0.8p+33f, 0x0.8p+33f, 0x0.8p+33f }; int a = f; int __attribute__((vector_size (16))) vi = __builtin_convertvector (vf, int __attribute__((vector_size (16)))); __builtin_printf ("%d\n", a); __builtin_printf ("{%d, %d, %d, %d}\n", vi[0], vi[1], vi[2], vi[3]); } This prints -2147483648 {-2147483648, -2147483648, -2147483648, -2147483648} at -O0 or -O2, but with -O2 -Dvolatile= prints 2147483647 {2147483647, 2147483647, 2147483647, 2147483647} instead. Either is IMHO fine, the C standard doesn't specify what should be the result of the conversion. Now, whether for _mm_cvttps_epi32 etc. such cases are also unspecified or not is debatable. The Intel spec obviously specifies what the CPU instructions do even in those otherwise unspecified cases, the question is if the intrinsic must behave the same or if those invalid conversions are still unspecified. If they'd be well defined when using the intrinsics, arguably the backend shouldn't use FIX RTL but some UNSPEC, or should use the FIX RTL conditionally (if_then_else:SI (argument_is_in_bounds) (fix arg) (const_int 0x8000000)).