Re: [PATCH 1/2] allow contraction to synthetic single-element vector FMA

Alexander Monakov Tue, 24 Jun 2025 07:31:16 -0700

> > Thanks! Any thoughts on the other patch in the thread, about flipping
> > -ffp-contract from =fast to =on?
> 
> I can't find this mail, not in spam either, but I'm OK with such change if it
> comes with test coverage.


Ouch, let me reproduce it below. About test coverage, I'm not exactly sure what
you mean, so I'll try to explain my perspective.

We definitely have existing test coverage for production of FMAs, especially
on targets where FMA is in baseline ISA (aarch64, ia64, rs6000, maybe others),
but even x86 has a number of tests where fma is enabled with -m flags.  Some
of those tests will regress and the question is what to do about them.

On x86, I've provided a fix for one test (the patch you just approved), but
I can't fix everything and in the patch I'm changing the remaining tests to
pass -ffp-contract=fast.  But most of the tests are actually highlighting
missed optimizations, like SLP not able to form addsub FMAs from a pair of
fmadd+fmsub.

And the question of what to do on all the other targets remains.

--- 8< ---

>From 478d51bc46c925028f6beeee1782fcadb3a92a58 Mon Sep 17 00:00:00 2001
From: Alexander Monakov <amona...@ispras.ru>
Date: Mon, 12 May 2025 23:23:42 +0300
Subject: [PATCH] c-family: switch away from -ffp-contract=fast

Unrestricted FMA contraction, combined with optimizations that create
copies of expressions, is causing hard-to-debug issues such as PR106902.
Since we implement conformant contraction now, switch C and C++ from
-ffp-contract=fast to either =off (under -std=c[++]NN like before, also
for C++ now), or =on (under -std=gnu[++]NN).  Keep -ffp-contract=fast
when -funsafe-math-optimizations (or -ffast-math, -Ofast) is active.

In other words,

- -ffast-math: no change, unrestricted contraction like before;
- standards compliant mode for C: no change, no contraction;
- ditto, for C++: align with C (no contraction);
- otherwise, switch C and C++ from -ffp-contract=fast to =on.

gcc/c-family/ChangeLog:

        * c-opts.cc (c_common_post_options): Adjust handling of
        flag_fp_contract_mode.

gcc/ChangeLog:

        * doc/invoke.texi (-ffp-contract): Describe new defaults.
        (-funsafe-math-optimizations): Add -ffp-contract=fast.

gcc/testsuite/ChangeLog:

        * gcc.target/i386/pr81904.c: Add -ffp-contract=fast to flags.
        * gcc.target/i386/pr116979.c: Ditto.
        * gcc.target/i386/intrinsics_opt-1.c: Ditto.
        * gcc.target/i386/part-vect-fmaddsubhf-1.c: Ditto.
        * gcc.target/i386/avx512f-vect-fmaddsubXXXps.c: Ditto.
        * gcc.target/i386/avx512f-vect-fmaddsubXXXpd.c: Ditto.
        * gcc.target/i386/avx512f-vect-fmsubaddXXXps.c: Ditto.
        * gcc.target/i386/avx512f-vect-fmsubaddXXXpd.c: Ditto.
---
 gcc/c-family/c-opts.cc                                | 11 ++---------
 gcc/doc/invoke.texi                                   | 10 ++++++----
 .../gcc.target/i386/avx512f-vect-fmaddsubXXXpd.c      |  2 +-
 .../gcc.target/i386/avx512f-vect-fmaddsubXXXps.c      |  2 +-
 .../gcc.target/i386/avx512f-vect-fmsubaddXXXpd.c      |  2 +-
 .../gcc.target/i386/avx512f-vect-fmsubaddXXXps.c      |  2 +-
 gcc/testsuite/gcc.target/i386/intrinsics_opt-1.c      |  2 +-
 .../gcc.target/i386/part-vect-fmaddsubhf-1.c          |  2 +-
 gcc/testsuite/gcc.target/i386/pr116979.c              |  2 +-
 gcc/testsuite/gcc.target/i386/pr81904.c               |  2 +-
 10 files changed, 16 insertions(+), 21 deletions(-)

diff --git a/gcc/c-family/c-opts.cc b/gcc/c-family/c-opts.cc
index 697518637d..1ae45e2b9a 100644
--- a/gcc/c-family/c-opts.cc
+++ b/gcc/c-family/c-opts.cc
@@ -877,15 +877,8 @@ c_common_post_options (const char **pfilename)
     flag_excess_precision = (flag_iso ? EXCESS_PRECISION_STANDARD
                             : EXCESS_PRECISION_FAST);
 
-  /* ISO C restricts floating-point expression contraction to within
-     source-language expressions (-ffp-contract=on, currently an alias
-     for -ffp-contract=off).  */
-  if (flag_iso
-      && !c_dialect_cxx ()
-      && (OPTION_SET_P (flag_fp_contract_mode)
-         == (enum fp_contract_mode) 0)
-      && flag_unsafe_math_optimizations == 0)
-    flag_fp_contract_mode = FP_CONTRACT_OFF;
+  if (!flag_unsafe_math_optimizations && !OPTION_SET_P (flag_fp_contract_mode))
+    flag_fp_contract_mode = flag_iso ? FP_CONTRACT_OFF : FP_CONTRACT_ON;
 
   /* C language modes before C99 enable -fpermissive by default, but
      only if -pedantic-errors is not specified.  Also treat
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 91b0a201e1..b3a77320ee 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -13070,16 +13070,17 @@ Disabled by default.
 @opindex ffp-contract
 @item -ffp-contract=@var{style}
 @option{-ffp-contract=off} disables floating-point expression contraction.
+This is the default for C and C++ in a standards compliant mode
+(@option{-std=c11}, @option{-std=c++11} or similar).
 @option{-ffp-contract=fast} enables floating-point expression contraction
 such as forming of fused multiply-add operations if the target has
 native support for them.
+This is the default for languages other than C and C++.
 @option{-ffp-contract=on} enables floating-point expression contraction
 if allowed by the language standard.  This is implemented for C and C++,
 where it enables contraction within one expression, but not across
 different statements.
-
-The default is @option{-ffp-contract=off} for C in a standards compliant mode
-(@option{-std=c11} or similar), @option{-ffp-contract=fast} otherwise.
+This is the default when compiling GNU dialects of C or C++.
 
 @opindex fomit-frame-pointer
 @item -fomit-frame-pointer
@@ -15567,7 +15568,8 @@ or ISO rules/specifications for math functions. It may, 
however,
 yield faster code for programs that do not require the guarantees
 of these specifications.
 Enables @option{-fno-signed-zeros}, @option{-fno-trapping-math},
-@option{-fassociative-math} and @option{-freciprocal-math}.
+@option{-fassociative-math}, @option{-freciprocal-math} and
+@option{-ffp-contract=fast}.
 
 The default is @option{-fno-unsafe-math-optimizations}.
 
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vect-fmaddsubXXXpd.c 
b/gcc/testsuite/gcc.target/i386/avx512f-vect-fmaddsubXXXpd.c
index 734f9e0144..2510d96937 100644
--- a/gcc/testsuite/gcc.target/i386/avx512f-vect-fmaddsubXXXpd.c
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vect-fmaddsubXXXpd.c
@@ -1,6 +1,6 @@
 /* { dg-do run } */
 /* { dg-require-effective-target avx512f } */
-/* { dg-options "-O3 -mfma -save-temps -mavx512f -mprefer-vector-width=512" } 
*/
+/* { dg-options "-O3 -mfma -save-temps -mavx512f -mprefer-vector-width=512 
-ffp-contract=fast" } */
 
 #include "fma-check.h"
 void __attribute__((noipa))
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vect-fmaddsubXXXps.c 
b/gcc/testsuite/gcc.target/i386/avx512f-vect-fmaddsubXXXps.c
index ae196c5ef4..6c3577ff49 100644
--- a/gcc/testsuite/gcc.target/i386/avx512f-vect-fmaddsubXXXps.c
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vect-fmaddsubXXXps.c
@@ -1,6 +1,6 @@
 /* { dg-do run } */
 /* { dg-require-effective-target avx512f } */
-/* { dg-options "-O3 -mavx512f -mprefer-vector-width=512 -save-temps" } */
+/* { dg-options "-O3 -mavx512f -mprefer-vector-width=512 -save-temps 
-ffp-contract=fast" } */
 
 #include "fma-check.h"
 void __attribute__((noipa))
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vect-fmsubaddXXXpd.c 
b/gcc/testsuite/gcc.target/i386/avx512f-vect-fmsubaddXXXpd.c
index cde76db175..bfea44e3e7 100644
--- a/gcc/testsuite/gcc.target/i386/avx512f-vect-fmsubaddXXXpd.c
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vect-fmsubaddXXXpd.c
@@ -1,6 +1,6 @@
 /* { dg-do run } */
 /* { dg-require-effective-target avx512f } */
-/* { dg-options "-O3 -mavx512f -mprefer-vector-width=512 -save-temps" } */
+/* { dg-options "-O3 -mavx512f -mprefer-vector-width=512 -save-temps 
-ffp-contract=fast" } */
 
 #include "fma-check.h"
 void __attribute__((noipa))
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vect-fmsubaddXXXps.c 
b/gcc/testsuite/gcc.target/i386/avx512f-vect-fmsubaddXXXps.c
index 59de39f411..ff4c475aa4 100644
--- a/gcc/testsuite/gcc.target/i386/avx512f-vect-fmsubaddXXXps.c
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vect-fmsubaddXXXps.c
@@ -1,6 +1,6 @@
 /* { dg-do run } */
 /* { dg-require-effective-target avx512f } */
-/* { dg-options "-O3 -mavx512f -mprefer-vector-width=512 -save-temps" } */
+/* { dg-options "-O3 -mavx512f -mprefer-vector-width=512 -save-temps 
-ffp-contract=fast" } */
 
 #include "fma-check.h"
 void __attribute__((noipa))
diff --git a/gcc/testsuite/gcc.target/i386/intrinsics_opt-1.c 
b/gcc/testsuite/gcc.target/i386/intrinsics_opt-1.c
index a75bf4e9ca..edd8a7d964 100644
--- a/gcc/testsuite/gcc.target/i386/intrinsics_opt-1.c
+++ b/gcc/testsuite/gcc.target/i386/intrinsics_opt-1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -mfma" } */
+/* { dg-options "-O2 -mfma -ffp-contract=fast" } */
 
 #include <emmintrin.h>
 
diff --git a/gcc/testsuite/gcc.target/i386/part-vect-fmaddsubhf-1.c 
b/gcc/testsuite/gcc.target/i386/part-vect-fmaddsubhf-1.c
index 051f992f66..2e7ca250c7 100644
--- a/gcc/testsuite/gcc.target/i386/part-vect-fmaddsubhf-1.c
+++ b/gcc/testsuite/gcc.target/i386/part-vect-fmaddsubhf-1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-mavx512fp16 -mavx512vl -O2" } */
+/* { dg-options "-mavx512fp16 -mavx512vl -O2 -ffp-contract=fast" } */
 /* { dg-final { scan-assembler-times "vfmaddsub...ph\[ 
\t\]+\[^\n\]*%xmm\[0-9\]" 1 { target { ! ia32 } } } } */
 /* { dg-final { scan-assembler-times "vfmsubadd...ph\[ 
\t\]+\[^\n\]*%xmm\[0-9\]" 1 { target { ! ia32 } } } } */
 
diff --git a/gcc/testsuite/gcc.target/i386/pr116979.c 
b/gcc/testsuite/gcc.target/i386/pr116979.c
index 0d2a958af4..fa2022b8a9 100644
--- a/gcc/testsuite/gcc.target/i386/pr116979.c
+++ b/gcc/testsuite/gcc.target/i386/pr116979.c
@@ -1,6 +1,6 @@
 /* PR target/116979 */
 /* { dg-do compile } */
-/* { dg-options "-O2 -mfma -fvect-cost-model=unlimited" } */
+/* { dg-options "-O2 -mfma -fvect-cost-model=unlimited -ffp-contract=fast" } */
 /* { dg-final { scan-assembler "vfmaddsub(?:132|213|231)pd" } } */
 /* { dg-final { scan-assembler "vfmaddsub(?:132|213|231)ps" { target { ! ia32 
} } } } */
 
diff --git a/gcc/testsuite/gcc.target/i386/pr81904.c 
b/gcc/testsuite/gcc.target/i386/pr81904.c
index 9f5ad0bd95..ef07c8b08c 100644
--- a/gcc/testsuite/gcc.target/i386/pr81904.c
+++ b/gcc/testsuite/gcc.target/i386/pr81904.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-mavx512fp16 -mavx512vl -O2 -mprefer-vector-width=512" } */
+/* { dg-options "-mavx512fp16 -mavx512vl -O2 -mprefer-vector-width=512 
-ffp-contract=fast" } */
 /* { dg-final { scan-assembler-times "vfmaddsub...ph\[ 
\t\]+\[^\n\]*%zmm\[0-9\]" 1 } } */
 /* { dg-final { scan-assembler-times "vfmsubadd...ph\[ 
\t\]+\[^\n\]*%zmm\[0-9\]" 1 } } */
 
-- 
2.49.0

Re: [PATCH 1/2] allow contraction to synthetic single-element vector FMA

Reply via email to