Re: [PATCH 1/2] allow contraction to synthetic single-element vector FMA

Richard Biener Tue, 24 Jun 2025 09:02:37 -0700

On Tue, Jun 24, 2025 at 4:28 PM Alexander Monakov <amona...@ispras.ru> wrote:
>
> > > Thanks! Any thoughts on the other patch in the thread, about flipping
> > > -ffp-contract from =fast to =on?
> >
> > I can't find this mail, not in spam either, but I'm OK with such change if 
> > it
> > comes with test coverage.
>
> Ouch, let me reproduce it below. About test coverage, I'm not exactly sure 
> what
> you mean, so I'll try to explain my perspective.
>
> We definitely have existing test coverage for production of FMAs, especially
> on targets where FMA is in baseline ISA (aarch64, ia64, rs6000, maybe others),
> but even x86 has a number of tests where fma is enabled with -m flags.  Some
> of those tests will regress and the question is what to do about them.
>
> On x86, I've provided a fix for one test (the patch you just approved), but
> I can't fix everything and in the patch I'm changing the remaining tests to
> pass -ffp-contract=fast.  But most of the tests are actually highlighting
> missed optimizations, like SLP not able to form addsub FMAs from a pair of
> fmadd+fmsub.


I'd say we want to fix these kind of things before switching the default.  Can
you file bugreports for the distinct issues you noticed when adjusting the
testcases?  I suppose they are reproducible as well when using the C
fma() function directly?

Thanks,
Richard.

>
> And the question of what to do on all the other targets remains.
>
> --- 8< ---
>
> From 478d51bc46c925028f6beeee1782fcadb3a92a58 Mon Sep 17 00:00:00 2001
> From: Alexander Monakov <amona...@ispras.ru>
> Date: Mon, 12 May 2025 23:23:42 +0300
> Subject: [PATCH] c-family: switch away from -ffp-contract=fast
>
> Unrestricted FMA contraction, combined with optimizations that create
> copies of expressions, is causing hard-to-debug issues such as PR106902.
> Since we implement conformant contraction now, switch C and C++ from
> -ffp-contract=fast to either =off (under -std=c[++]NN like before, also
> for C++ now), or =on (under -std=gnu[++]NN).  Keep -ffp-contract=fast
> when -funsafe-math-optimizations (or -ffast-math, -Ofast) is active.
>
> In other words,
>
> - -ffast-math: no change, unrestricted contraction like before;
> - standards compliant mode for C: no change, no contraction;
> - ditto, for C++: align with C (no contraction);
> - otherwise, switch C and C++ from -ffp-contract=fast to =on.
>
> gcc/c-family/ChangeLog:
>
>         * c-opts.cc (c_common_post_options): Adjust handling of
>         flag_fp_contract_mode.
>
> gcc/ChangeLog:
>
>         * doc/invoke.texi (-ffp-contract): Describe new defaults.
>         (-funsafe-math-optimizations): Add -ffp-contract=fast.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.target/i386/pr81904.c: Add -ffp-contract=fast to flags.
>         * gcc.target/i386/pr116979.c: Ditto.
>         * gcc.target/i386/intrinsics_opt-1.c: Ditto.
>         * gcc.target/i386/part-vect-fmaddsubhf-1.c: Ditto.
>         * gcc.target/i386/avx512f-vect-fmaddsubXXXps.c: Ditto.
>         * gcc.target/i386/avx512f-vect-fmaddsubXXXpd.c: Ditto.
>         * gcc.target/i386/avx512f-vect-fmsubaddXXXps.c: Ditto.
>         * gcc.target/i386/avx512f-vect-fmsubaddXXXpd.c: Ditto.
> ---
>  gcc/c-family/c-opts.cc                                | 11 ++---------
>  gcc/doc/invoke.texi                                   | 10 ++++++----
>  .../gcc.target/i386/avx512f-vect-fmaddsubXXXpd.c      |  2 +-
>  .../gcc.target/i386/avx512f-vect-fmaddsubXXXps.c      |  2 +-
>  .../gcc.target/i386/avx512f-vect-fmsubaddXXXpd.c      |  2 +-
>  .../gcc.target/i386/avx512f-vect-fmsubaddXXXps.c      |  2 +-
>  gcc/testsuite/gcc.target/i386/intrinsics_opt-1.c      |  2 +-
>  .../gcc.target/i386/part-vect-fmaddsubhf-1.c          |  2 +-
>  gcc/testsuite/gcc.target/i386/pr116979.c              |  2 +-
>  gcc/testsuite/gcc.target/i386/pr81904.c               |  2 +-
>  10 files changed, 16 insertions(+), 21 deletions(-)
>
> diff --git a/gcc/c-family/c-opts.cc b/gcc/c-family/c-opts.cc
> index 697518637d..1ae45e2b9a 100644
> --- a/gcc/c-family/c-opts.cc
> +++ b/gcc/c-family/c-opts.cc
> @@ -877,15 +877,8 @@ c_common_post_options (const char **pfilename)
>      flag_excess_precision = (flag_iso ? EXCESS_PRECISION_STANDARD
>                              : EXCESS_PRECISION_FAST);
>
> -  /* ISO C restricts floating-point expression contraction to within
> -     source-language expressions (-ffp-contract=on, currently an alias
> -     for -ffp-contract=off).  */
> -  if (flag_iso
> -      && !c_dialect_cxx ()
> -      && (OPTION_SET_P (flag_fp_contract_mode)
> -         == (enum fp_contract_mode) 0)
> -      && flag_unsafe_math_optimizations == 0)
> -    flag_fp_contract_mode = FP_CONTRACT_OFF;
> +  if (!flag_unsafe_math_optimizations && !OPTION_SET_P 
> (flag_fp_contract_mode))
> +    flag_fp_contract_mode = flag_iso ? FP_CONTRACT_OFF : FP_CONTRACT_ON;
>
>    /* C language modes before C99 enable -fpermissive by default, but
>       only if -pedantic-errors is not specified.  Also treat
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 91b0a201e1..b3a77320ee 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -13070,16 +13070,17 @@ Disabled by default.
>  @opindex ffp-contract
>  @item -ffp-contract=@var{style}
>  @option{-ffp-contract=off} disables floating-point expression contraction.
> +This is the default for C and C++ in a standards compliant mode
> +(@option{-std=c11}, @option{-std=c++11} or similar).
>  @option{-ffp-contract=fast} enables floating-point expression contraction
>  such as forming of fused multiply-add operations if the target has
>  native support for them.
> +This is the default for languages other than C and C++.
>  @option{-ffp-contract=on} enables floating-point expression contraction
>  if allowed by the language standard.  This is implemented for C and C++,
>  where it enables contraction within one expression, but not across
>  different statements.
> -
> -The default is @option{-ffp-contract=off} for C in a standards compliant mode
> -(@option{-std=c11} or similar), @option{-ffp-contract=fast} otherwise.
> +This is the default when compiling GNU dialects of C or C++.
>
>  @opindex fomit-frame-pointer
>  @item -fomit-frame-pointer
> @@ -15567,7 +15568,8 @@ or ISO rules/specifications for math functions. It 
> may, however,
>  yield faster code for programs that do not require the guarantees
>  of these specifications.
>  Enables @option{-fno-signed-zeros}, @option{-fno-trapping-math},
> -@option{-fassociative-math} and @option{-freciprocal-math}.
> +@option{-fassociative-math}, @option{-freciprocal-math} and
> +@option{-ffp-contract=fast}.
>
>  The default is @option{-fno-unsafe-math-optimizations}.
>
> diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vect-fmaddsubXXXpd.c 
> b/gcc/testsuite/gcc.target/i386/avx512f-vect-fmaddsubXXXpd.c
> index 734f9e0144..2510d96937 100644
> --- a/gcc/testsuite/gcc.target/i386/avx512f-vect-fmaddsubXXXpd.c
> +++ b/gcc/testsuite/gcc.target/i386/avx512f-vect-fmaddsubXXXpd.c
> @@ -1,6 +1,6 @@
>  /* { dg-do run } */
>  /* { dg-require-effective-target avx512f } */
> -/* { dg-options "-O3 -mfma -save-temps -mavx512f -mprefer-vector-width=512" 
> } */
> +/* { dg-options "-O3 -mfma -save-temps -mavx512f -mprefer-vector-width=512 
> -ffp-contract=fast" } */
>
>  #include "fma-check.h"
>  void __attribute__((noipa))
> diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vect-fmaddsubXXXps.c 
> b/gcc/testsuite/gcc.target/i386/avx512f-vect-fmaddsubXXXps.c
> index ae196c5ef4..6c3577ff49 100644
> --- a/gcc/testsuite/gcc.target/i386/avx512f-vect-fmaddsubXXXps.c
> +++ b/gcc/testsuite/gcc.target/i386/avx512f-vect-fmaddsubXXXps.c
> @@ -1,6 +1,6 @@
>  /* { dg-do run } */
>  /* { dg-require-effective-target avx512f } */
> -/* { dg-options "-O3 -mavx512f -mprefer-vector-width=512 -save-temps" } */
> +/* { dg-options "-O3 -mavx512f -mprefer-vector-width=512 -save-temps 
> -ffp-contract=fast" } */
>
>  #include "fma-check.h"
>  void __attribute__((noipa))
> diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vect-fmsubaddXXXpd.c 
> b/gcc/testsuite/gcc.target/i386/avx512f-vect-fmsubaddXXXpd.c
> index cde76db175..bfea44e3e7 100644
> --- a/gcc/testsuite/gcc.target/i386/avx512f-vect-fmsubaddXXXpd.c
> +++ b/gcc/testsuite/gcc.target/i386/avx512f-vect-fmsubaddXXXpd.c
> @@ -1,6 +1,6 @@
>  /* { dg-do run } */
>  /* { dg-require-effective-target avx512f } */
> -/* { dg-options "-O3 -mavx512f -mprefer-vector-width=512 -save-temps" } */
> +/* { dg-options "-O3 -mavx512f -mprefer-vector-width=512 -save-temps 
> -ffp-contract=fast" } */
>
>  #include "fma-check.h"
>  void __attribute__((noipa))
> diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vect-fmsubaddXXXps.c 
> b/gcc/testsuite/gcc.target/i386/avx512f-vect-fmsubaddXXXps.c
> index 59de39f411..ff4c475aa4 100644
> --- a/gcc/testsuite/gcc.target/i386/avx512f-vect-fmsubaddXXXps.c
> +++ b/gcc/testsuite/gcc.target/i386/avx512f-vect-fmsubaddXXXps.c
> @@ -1,6 +1,6 @@
>  /* { dg-do run } */
>  /* { dg-require-effective-target avx512f } */
> -/* { dg-options "-O3 -mavx512f -mprefer-vector-width=512 -save-temps" } */
> +/* { dg-options "-O3 -mavx512f -mprefer-vector-width=512 -save-temps 
> -ffp-contract=fast" } */
>
>  #include "fma-check.h"
>  void __attribute__((noipa))
> diff --git a/gcc/testsuite/gcc.target/i386/intrinsics_opt-1.c 
> b/gcc/testsuite/gcc.target/i386/intrinsics_opt-1.c
> index a75bf4e9ca..edd8a7d964 100644
> --- a/gcc/testsuite/gcc.target/i386/intrinsics_opt-1.c
> +++ b/gcc/testsuite/gcc.target/i386/intrinsics_opt-1.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O2 -mfma" } */
> +/* { dg-options "-O2 -mfma -ffp-contract=fast" } */
>
>  #include <emmintrin.h>
>
> diff --git a/gcc/testsuite/gcc.target/i386/part-vect-fmaddsubhf-1.c 
> b/gcc/testsuite/gcc.target/i386/part-vect-fmaddsubhf-1.c
> index 051f992f66..2e7ca250c7 100644
> --- a/gcc/testsuite/gcc.target/i386/part-vect-fmaddsubhf-1.c
> +++ b/gcc/testsuite/gcc.target/i386/part-vect-fmaddsubhf-1.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-mavx512fp16 -mavx512vl -O2" } */
> +/* { dg-options "-mavx512fp16 -mavx512vl -O2 -ffp-contract=fast" } */
>  /* { dg-final { scan-assembler-times "vfmaddsub...ph\[ 
> \t\]+\[^\n\]*%xmm\[0-9\]" 1 { target { ! ia32 } } } } */
>  /* { dg-final { scan-assembler-times "vfmsubadd...ph\[ 
> \t\]+\[^\n\]*%xmm\[0-9\]" 1 { target { ! ia32 } } } } */
>
> diff --git a/gcc/testsuite/gcc.target/i386/pr116979.c 
> b/gcc/testsuite/gcc.target/i386/pr116979.c
> index 0d2a958af4..fa2022b8a9 100644
> --- a/gcc/testsuite/gcc.target/i386/pr116979.c
> +++ b/gcc/testsuite/gcc.target/i386/pr116979.c
> @@ -1,6 +1,6 @@
>  /* PR target/116979 */
>  /* { dg-do compile } */
> -/* { dg-options "-O2 -mfma -fvect-cost-model=unlimited" } */
> +/* { dg-options "-O2 -mfma -fvect-cost-model=unlimited -ffp-contract=fast" } 
> */
>  /* { dg-final { scan-assembler "vfmaddsub(?:132|213|231)pd" } } */
>  /* { dg-final { scan-assembler "vfmaddsub(?:132|213|231)ps" { target { ! 
> ia32 } } } } */
>
> diff --git a/gcc/testsuite/gcc.target/i386/pr81904.c 
> b/gcc/testsuite/gcc.target/i386/pr81904.c
> index 9f5ad0bd95..ef07c8b08c 100644
> --- a/gcc/testsuite/gcc.target/i386/pr81904.c
> +++ b/gcc/testsuite/gcc.target/i386/pr81904.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-mavx512fp16 -mavx512vl -O2 -mprefer-vector-width=512" } */
> +/* { dg-options "-mavx512fp16 -mavx512vl -O2 -mprefer-vector-width=512 
> -ffp-contract=fast" } */
>  /* { dg-final { scan-assembler-times "vfmaddsub...ph\[ 
> \t\]+\[^\n\]*%zmm\[0-9\]" 1 } } */
>  /* { dg-final { scan-assembler-times "vfmsubadd...ph\[ 
> \t\]+\[^\n\]*%zmm\[0-9\]" 1 } } */
>
> --
> 2.49.0
>

Re: [PATCH 1/2] allow contraction to synthetic single-element vector FMA

Reply via email to