On Mon, Jul 31, 2023 at 11:40 AM Richard Biener <rguent...@suse.de> wrote: > > On Sun, 30 Jul 2023, Uros Bizjak wrote: > > > Also introduce -m[no-]mmxfp-with-sse option to disable trapping V2SF > > named patterns in order to avoid generation of partial vector V4SFmode > > trapping instructions. > > > > The new option is enabled by default, because even with sanitization, > > a small but consistent speed up of 2 to 3% with Polyhedron capacita > > benchmark can be achieved vs. scalar code. > > > > Using -fno-trapping-math improves Polyhedron capacita runtime 8 to 9% > > vs. scalar code. This is what clang does by default, as it defaults > > to -fno-trapping-math. > > I like the new option, note you lack invoke.texi documentation where > I'd also elaborate a bit on the interaction with -fno-trapping-math > and the possible performance impact then NaNs or denormals leak > into the upper halves and cross-reference -mdaz-ftz.
The attached doc patch is invoke.texi entry for -mmmxfp-with-sse option. It is written in a way to also cover half-float vectors. WDYT? Uros.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index fa765d5a0dd..99093172abe 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -1417,6 +1417,7 @@ See RS/6000 and PowerPC Options. -mcld -mcx16 -msahf -mmovbe -mcrc32 -mmwait -mrecip -mrecip=@var{opt} -mvzeroupper -mprefer-avx128 -mprefer-vector-width=@var{opt} +-mmmxfp-with-sse -mmove-max=@var{bits} -mstore-max=@var{bits} -mmmx -msse -msse2 -msse3 -mssse3 -msse4.1 -msse4.2 -msse4 -mavx -mavx2 -mavx512f -mavx512pf -mavx512er -mavx512cd -mavx512vl @@ -33708,6 +33709,22 @@ This option instructs GCC to use 128-bit AVX instructions instead of This option instructs GCC to use @var{opt}-bit vector width in instructions instead of default on the selected platform. +@opindex -mmmxfp-with-sse +@item -mmmxfp-with-sse +This option enables GCC to generate trapping floating-point operations on +partial vectors, where vector elements reside in the low part of the 128-bit +SSE register. Unless @option{-fno-trapping-math} is specified, the compiler +guarantees correct trapping behavior by sanitizing all input operands to +have zeroes in the upper part of the vector register. Note that by using +built-in functions or inline assembly with partial vector arguments, NaNs, +denormal or invalid values can leak into the upper part of the vector, +causing possible performance issues when @option{-fno-trapping-math} is in +effect. These issues can be mitigated by manually sanitizing the upper part +of the partial vector argument register or by using @option{-mdaz-ftz} to set +denormals-are-zero (DAZ) flag in the MXCSR register. + +This option is enabled by default. + @opindex mmove-max @item -mmove-max=@var{bits} This option instructs GCC to set the maximum number of bits can be