On Wed, Mar 4, 2026 at 9:20 PM Andrew Pinski
<[email protected]> wrote:
>
> On Wed, Mar 4, 2026 at 12:12 AM
> <[email protected]> wrote:
> >
> > From: Abhishek Kaushik <[email protected]>
> >
> > The FMA folds in match.pd currently only matches (negate @0) directly.
> > When the negated operand is wrapped in a type conversion
> > (e.g. (convert (negate @0))), the simplification to IFN_FNMA does not
> > trigger.
> >
> > This prevents folding of patterns such as:
> >
> > *c = *c - (v8u)(*a * *b);
> >
> > when the multiply operands undergo vector type conversions before being
> > passed to FMA. In such cases the expression lowers to neg + mla instead
> > of the more optimal msb on AArch64 SVE, because the canonicalization
> > step cannot see through the casts.
> >
> > Extend the match pattern to allow optional conversions on the negated
> > operand and the second multiplicand:
> >
> > (fmas:c (convert? (negate @0)) (convert? @1) @2)
> >
> > and explicitly rebuild the converted operands in the IFN_FNMA
> > replacement. This enables recognition of the subtraction-of-product form
> > even when vector element type casts are present.
> >
> > With this change, AArch64 SVE code generation is able to select msb
> > instead of emitting a separate neg followed by mla.
> >
> > This patch was bootstrapped and regression tested on aarch64-linux-gnu.
> >
> > gcc/
> >         PR target/123897
> >         * match.pd: Allow optional conversions in FMA-to-FNMA
> >         canonicalization and reconstruct converted operands in
> >         the replacement.
> >
> > gcc/testsuite/
> >         PR target/123897
> >         * gcc.target/aarch64/sve/fnma_match.c: New test.
> >         * gcc.target/aarch64/sve/pr123897.c:
> >         Update the test to scan for FNMA in the tree dump.
> > ---
> >  gcc/match.pd                                  |  4 +--
> >  .../gcc.target/aarch64/sve/fnma_match.c       | 28 +++++++++++++++++++
> >  .../gcc.target/aarch64/sve/pr123897.c         |  3 +-
> >  3 files changed, 32 insertions(+), 3 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/fnma_match.c
> >
> > diff --git a/gcc/match.pd b/gcc/match.pd
> > index 7f16fd4e081..4cce9463f8f 100644
> > --- a/gcc/match.pd
> > +++ b/gcc/match.pd
> > @@ -10255,8 +10255,8 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >  (if (canonicalize_math_after_vectorization_p ())
> >   (for fmas (FMA)
> >    (simplify
> > -   (fmas:c (negate @0) @1 @2)
> > -   (IFN_FNMA @0 @1 @2))
> > +   (fmas:c (convert? (negate @0)) (convert? @1) @2)
> > +   (IFN_FNMA (convert @0) (convert @1) @2))
>
> I think you need to check the types are nop conversions rather than
> just convert.
> So using nop_convert here would be better instead of adding the
> tree_nop_conversion_p check.
> Can you check if using nop_convert would work?

Also this should probably re-use the conversions, like

> > +   (fmas:c (convert? (negate @0)) (convert?@11 @1) @2)
> > +   (IFN_FNMA (convert @0) @11 @2))

If the negate was unsigned you might now introduce UB when
converting @0 to signed.  So IMO you also need to possibly pun
the whole IFN_FNMA to operate on an unsigned type (for integer type FMA)

Richard.

>
> Thanks,
> Andrew
>
>
> >    (simplify
> >     (fmas @0 @1 (negate @2))
> >     (IFN_FMS @0 @1 @2))
> > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/fnma_match.c 
> > b/gcc/testsuite/gcc.target/aarch64/sve/fnma_match.c
> > new file mode 100644
> > index 00000000000..08607b172e2
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/sve/fnma_match.c
> > @@ -0,0 +1,28 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -march=armv9-a -msve-vector-bits=256" } */
> > +
> > +typedef __attribute__((__vector_size__(sizeof(int)*8))) signed int v8i;
> > +typedef __attribute__((__vector_size__(sizeof(int)*8))) unsigned int v8u;
> > +
> > +void g(v8i *a,v8i *b,v8u *c)
> > +{
> > +  *c = *c - (v8u)(*a * *b);
> > +}
> > +
> > +void h(v8u *a,v8u *b,v8i *c)
> > +{
> > +  *c = *c - (v8i)(*a * *b);
> > +}
> > +
> > +void x(v8i *a,v8i *b,v8i *c)
> > +{
> > +  *c = *c - (*a * *b);
> > +}
> > +
> > +void y(v8u *a,v8u *b,v8u *c)
> > +{
> > +  *c = *c - (*a * *b);
> > +}
> > +
> > +/* { dg-final { scan-assembler-times "\\tmsb\\t" 4 } } */
> > +/* { dg-final { scan-assembler-not "\\tneg\\t" } } */
> > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pr123897.c 
> > b/gcc/testsuite/gcc.target/aarch64/sve/pr123897.c
> > index d74efabb7f8..45bc52522a9 100644
> > --- a/gcc/testsuite/gcc.target/aarch64/sve/pr123897.c
> > +++ b/gcc/testsuite/gcc.target/aarch64/sve/pr123897.c
> > @@ -13,4 +13,5 @@ void g(v8i *a,v8i *b,v8u *c)
> >    *c = *c - (v8u)(*a * *b);
> >  }
> >
> > -/* { dg-final { scan-tree-dump-times "\.FMA" 2 "widening_mul" } } */
> > +/* { dg-final { scan-tree-dump-times "\.FMA" 1 "widening_mul" } } */
> > +/* { dg-final { scan-tree-dump-times "\.FNMA" 1 "widening_mul" } } */
> > --
> > 2.43.0
> >

Reply via email to