[Bug target/115161] [15 Regression] highway-1.0.7 miscompilation of some SSE2 intrinsics

jakub at gcc dot gnu.org via Gcc-bugs Tue, 21 May 2024 07:53:21 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115161


Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jakub at gcc dot gnu.org

--- Comment #4 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
(In reply to Sergei Trofimovich from comment #3)
> Looking at -O2's bug.cc.265t.optimized tree optimizations come up with
> unfolded saturated sub8:
> 
>   _12 = __builtin_ia32_psubusb128 ({ -65, 0, 0, 0, -65, 0, 0, 0, -65, 0, 0,
> 0, -65, 0, 0, 0 }, { -99, 0, 0, 0, -99, 0, 0, 0, -99, 0, 0, 0, -99, 0, 0, 0
> });
>   _13 = __builtin_ia32_pminub128 (_12, { 32, 0, 0, 0, 32, 0, 0, 0, 32, 0, 0,
> 0, 32, 0, 0, 0 });
>   ...
> 
> 
> bug.cc.272r.cse1 still has that subtraction:
> 
>     5: r119:V16QI=[`*.LC0']
>       REG_EQUAL const_vector
>     6: r120:V16QI=[`*.LC1']
>       REG_EQUAL const_vector
>     7: r118:V16QI=us_minus(r119:V16QI,r120:V16QI)
> 
> bug.cc.273r.fwprop1 does not anymore:
> 
>     3: NOTE_INSN_BASIC_BLOCK 2
>     2: NOTE_INSN_FUNCTION_BEG
>     9: r122:V16QI=[`*.LC2']
>       REG_EQUAL const_vector
>    13: r123:V4SI=r122:V16QI#0<<0x17
>       REG_EQUAL const_vector
>    16: r128:SI=0x5f800000
>    15: r127:V4SI=vec_duplicate(r128:SI)
> 
> Could it be that constant folder "forgot" to generate anything for
> unsupported saturated-sub instead of leaving it as is?

No.  It is normal constant folding on RTL (not done on GIMPLE because
the i386 backend doesn't try to gimple fold __builtin_ia32_psubusb128
or __builtin_ia32_psubusb128).  0xbf - 0x9d is 0x22, so the us_minus works
actually in this case exactly like minus and because 0x20 is smaller than that,
the minimum is a vector with 0x20 elements (plus min (0 - 0, 0) = 0 elements).

The reason the testcase FAILs is the same as in the other PRs, it is trying to
convert
{0x0.8p+33f, 0x0.8p+33f, 0x0.8p+33f, 0x0.8p+33f}
V4SFmode vector to V4SImode, and because the backend sees the constant operand
of the
fix, it folds it to the unspecified value as with scalar conversion.

Consider:
int
main ()
{
  volatile float f = 0x0.8p+33f;
  volatile float __attribute__((vector_size (16))) vf = { 0x0.8p+33f,
0x0.8p+33f, 0x0.8p+33f, 0x0.8p+33f };
  int a = f;
  int __attribute__((vector_size (16))) vi = __builtin_convertvector (vf, int
__attribute__((vector_size (16))));
  __builtin_printf ("%d\n", a);
  __builtin_printf ("{%d, %d, %d, %d}\n", vi[0], vi[1], vi[2], vi[3]);
}
This prints
-2147483648
{-2147483648, -2147483648, -2147483648, -2147483648}
at -O0 or -O2, but with -O2 -Dvolatile= prints
2147483647
{2147483647, 2147483647, 2147483647, 2147483647}
instead.
Either is IMHO fine, the C standard doesn't specify what should be the result
of the conversion.
Now, whether for _mm_cvttps_epi32 etc. such cases are also unspecified or not
is debatable.  The Intel spec obviously specifies what the CPU instructions do
even in those otherwise unspecified cases, the question is if the intrinsic
must behave the same or if those invalid conversions are still unspecified.
If they'd be well defined when using the intrinsics, arguably the backend
shouldn't use FIX RTL but some UNSPEC, or should use the FIX RTL conditionally
(if_then_else:SI (argument_is_in_bounds) (fix arg) (const_int 0x8000000)).

[Bug target/115161] [15 Regression] highway-1.0.7 miscompilation of some SSE2 intrinsics

Reply via email to