On Tue, Mar 10, 2026 at 1:20 PM Richard Sandiford <[email protected]> wrote: > > Richard Biener <[email protected]> writes: > > On Mon, Mar 9, 2026 at 10:22 AM Eikansh Gupta <[email protected]> > > wrote: > >> > >> Add two match.pd patterns that simplify COND_ADD when the mask is > >> a strict greater-than comparison against zero and the else value equals > >> one of the addition operands: > >> > >> COND_ADD (a > 0, b, a, b) -> b + MAX_EXPR <a, 0> > >> COND_ADD (a > 0, a, b, b) -> b + MAX_EXPR <a, 0> > > > > I'll note this has to wait for stage1. > > > > Also I had expected a predicated add to be faster than a separate max > > + add? It's > > definitely more canonical on GIMPLE given it's a single stmt vs. two. > > Both suggest > > that match.pd isn't a good place to do such transform. > > If the gt is single-use then I suppose it's gt + cond_add vs max + add. > And add is arguably simpler than cond_add. Would adding :s to the gt > make it more acceptable?
Yes, this would at least clarify the intent. The pattern also lacks a test whether unconditional add and max are supported by the target. > That said, having rules in this direction (condition op to multiple > unconditional ops) sounds like it might risk cycles... If we also have the reverse, then yes. Richard. > Richard > > > Richard. > > > >> PR 124097 > >> > >> gcc/ChangeLog: > >> > >> * match.pd (COND_ADD (a > 0, a, b, b) -> b + MAX_EXPR <a, 0>): > >> New pattern. > >> > >> gcc/testsuite/ChangeLog: > >> > >> * gcc.dg/tree-ssa/pr124097.c: New test. > >> > >> Signed-off-by: Eikansh Gupta <[email protected]> > >> --- > >> gcc/match.pd | 15 +++++++++++++++ > >> gcc/testsuite/gcc.dg/tree-ssa/pr124097.c | 17 +++++++++++++++++ > >> 2 files changed, 32 insertions(+) > >> create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr124097.c > >> > >> diff --git a/gcc/match.pd b/gcc/match.pd > >> index 7f16fd4e081..2bb4f425895 100644 > >> --- a/gcc/match.pd > >> +++ b/gcc/match.pd > >> @@ -11323,6 +11323,21 @@ and, > >> && fold_real_zero_addition_p (type, NULL_TREE, @4, 0))) > >> (IFN_COND_ADD (bit_and @0 @2) @1 @3 @1))) > >> > >> +/* COND_ADD (a > 0, b, a, b) -> b + max (a, 0) > >> + COND_ADD (a > 0, a, b, b) -> b + max (a, 0) */ > >> +(simplify > >> + (IFN_COND_ADD (gt @0 zerop@1) @2 @0 @2) > >> + (if (FLOAT_TYPE_P (type) > >> + && !HONOR_NANS (type) > >> + && !HONOR_SIGNED_ZEROS (type)) > >> + (plus @2 (max @0 @1)))) > >> +(simplify > >> + (IFN_COND_ADD (gt @0 zerop@1) @0 @2 @2) > >> + (if (FLOAT_TYPE_P (type) > >> + && !HONOR_NANS (type) > >> + && !HONOR_SIGNED_ZEROS (type)) > >> + (plus @2 (max @0 @1)))) > >> + > >> /* Detect simplication for a conditional length reduction where > >> > >> a = mask ? b : 0 > >> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr124097.c > >> b/gcc/testsuite/gcc.dg/tree-ssa/pr124097.c > >> new file mode 100644 > >> index 00000000000..5769a2ca5d0 > >> --- /dev/null > >> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr124097.c > >> @@ -0,0 +1,17 @@ > >> +/* { dg-do compile } */ > >> +/* { dg-options "-O3 -ffast-math -march=armv9-a -fdump-tree-optimized" } > >> */ > >> + > >> +float __attribute__ ((noipa)) > >> +foo (float *a) > >> +{ > >> + float sum = 0.; > >> + for (int i = 0; i < 32000; i++) > >> + if (a[i] > (float) 0.) > >> + sum += a[i]; > >> + return sum; > >> +} > >> + > >> +/* The main vectorized loop should have COND_ADD simplified to MAX_EXPR + > >> plus. > >> + The SVE epilogue retains COND_ADD */ > >> +/* { dg-final { scan-tree-dump-times { MAX_EXPR } 2 "optimized" } } */ > >> +/* { dg-final { scan-tree-dump-times { \.COND_ADD } 2 "optimized" } } */ > >> -- > >> 2.34.1 > >>
