On Mon, Jul 31, 2023 at 11:40 AM Richard Biener <rguent...@suse.de> wrote:
>
> On Sun, 30 Jul 2023, Uros Bizjak wrote:
>
> > Also introduce -m[no-]mmxfp-with-sse option to disable trapping V2SF
> > named patterns in order to avoid generation of partial vector V4SFmode
> > trapping instructions.
> >
> > The new option is enabled by default, because even with sanitization,
> > a small but consistent speed up of 2 to 3% with Polyhedron capacita
> > benchmark can be achieved vs. scalar code.
> >
> > Using -fno-trapping-math improves Polyhedron capacita runtime 8 to 9%
> > vs. scalar code.  This is what clang does by default, as it defaults
> > to -fno-trapping-math.
>
> I like the new option, note you lack invoke.texi documentation where
> I'd also elaborate a bit on the interaction with -fno-trapping-math
> and the possible performance impact then NaNs or denormals leak
> into the upper halves and cross-reference -mdaz-ftz.

The attached doc patch is invoke.texi entry for -mmmxfp-with-sse
option. It is written in a way to also cover half-float vectors. WDYT?

Uros.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index fa765d5a0dd..99093172abe 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -1417,6 +1417,7 @@ See RS/6000 and PowerPC Options.
 -mcld  -mcx16  -msahf  -mmovbe  -mcrc32 -mmwait
 -mrecip  -mrecip=@var{opt}
 -mvzeroupper  -mprefer-avx128  -mprefer-vector-width=@var{opt}
+-mmmxfp-with-sse
 -mmove-max=@var{bits} -mstore-max=@var{bits}
 -mmmx  -msse  -msse2  -msse3  -mssse3  -msse4.1  -msse4.2  -msse4  -mavx
 -mavx2  -mavx512f  -mavx512pf  -mavx512er  -mavx512cd  -mavx512vl
@@ -33708,6 +33709,22 @@ This option instructs GCC to use 128-bit AVX 
instructions instead of
 This option instructs GCC to use @var{opt}-bit vector width in instructions
 instead of default on the selected platform.
 
+@opindex -mmmxfp-with-sse
+@item -mmmxfp-with-sse
+This option enables GCC to generate trapping floating-point operations on
+partial vectors, where vector elements reside in the low part of the 128-bit
+SSE register.  Unless @option{-fno-trapping-math} is specified, the compiler
+guarantees correct trapping behavior by sanitizing all input operands to
+have zeroes in the upper part of the vector register.  Note that by using
+built-in functions or inline assembly with partial vector arguments, NaNs,
+denormal or invalid values can leak into the upper part of the vector,
+causing possible performance issues when @option{-fno-trapping-math} is in
+effect.  These issues can be mitigated by manually sanitizing the upper part
+of the partial vector argument register or by using @option{-mdaz-ftz} to set
+denormals-are-zero (DAZ) flag in the MXCSR register.
+
+This option is enabled by default.
+
 @opindex mmove-max
 @item -mmove-max=@var{bits}
 This option instructs GCC to set the maximum number of bits can be

Reply via email to