On 26/11/2025 11:51, Dhruv Chawla wrote:
> On 25/11/25 19:58, Alex Coplan wrote:
> > External email: Use caution opening links or attachments
> >
> >
> > On 05/11/2025 10:39, Dhruv Chawla wrote:
> > > On 28/10/25 22:03, Alex Coplan wrote:
> > > > External email: Use caution opening links or attachments
> > > >
> > > >
> > > > Hi Dhruv,
> > > >
> > > > Sorry for the long wait on this one. Comments below ...
> > >
> > > Hi Alex,
> > >
> > > Thanks for the review. I have attached a new version of the patch
> > > to this email.
> >
> > Thanks, and sorry to be slow again...
> >
> > >
> > > >
> > > > On 18/08/2025 21:31, [email protected] wrote:
> > > > > From: Dhruv Chawla <[email protected]>
> > > > >
> > > > > This patch folds the following patterns:
> > > > > - umax (a, add (a, b)) -> [sum, ovf] = adds (a, b); !ovf ? sum : a
> > > > > - umin (a, add (a, b)) -> [sum, ovf] = adds (a, b); !ovf ? a : sum
> > > > > - umax (a, sub (a, b)) -> [diff, ovf] = subs (a, b); ovf ? diff : a
> > > > > - umin (a, sub (a, b)) -> [diff, ovf] = subs (a, b); ovf ? a : diff
> > > > >
> > > > > Where ovf is the overflow flag. adds and subs are generated by
> > > > > generating a parallel compare+plus/minus which maps to the pattern
> > > > > add<mode>3_compareC. sub<mode>3_compareC is also created to have an
> > > > > equivalent pattern for the subs instruction. In the case of subs, the
> > > > > overflow flag represents underflow.
> > > > >
> > > > > This patch is a respin of the patch posted at
> > > > > https://gcc.gnu.org/pipermail/gcc-patches/2025-May/685021.html as per
> > > > > the suggestion to turn it into a target-specific transform by Richard
> > > > > Biener.
> > > > >
> > > > > Bootstrapped and regtested on aarch64-unknown-linux-gnu.
> > > > >
> > > > > Signed-off-by: Dhruv Chawla <[email protected]>
> > > > >
> > > > > PR middle-end/116815
> > > > >
> > > > > gcc/ChangeLog:
> > > > >
> > > > > * config/aarch64/aarch64.md (sub<mode>3_compareC): New
> > > > > pattern.
> > > > > (*aarch64_plus_within_<optab><mode>3_<ovf_commutate>):
> > > > > Likewise.
> > > > > (*aarch64_minus_within_<optab><mode>3): Likewise.
> > > > > * config/aarch64/iterators.md (ovf_add_cmp): New code
> > > > > attribute.
> > > > > (ovf_sub_cmp): Likewise.
> > > > > (ovf_commutate): New iterator.
> > > > > (ovf_comm_opp): New int attribute.
> > > > >
> > > > > gcc/testsuite/ChangeLog:
> > > > >
> > > > > * gcc.target/aarch64/pr116815-1.c: New test.
> > > > > * gcc.target/aarch64/pr116815-2.c: Likewise.
> > > > > * gcc.target/aarch64/pr116815-3.c: Likewise.
> > > > > * gcc.target/aarch64/pr116815-4.c: Likewise.
> > > > > * gcc.target/aarch64/pr116815-5.c: Likewise.
> > > > > * gcc.target/aarch64/pr116815-6.c: Likewise.
> > > > > ---
> > > > > gcc/config/aarch64/aarch64.md | 76 +++++++++++
> > > > > gcc/config/aarch64/iterators.md | 9 ++
> > > > > gcc/testsuite/gcc.target/aarch64/pr116815-1.c | 120
> > > > > ++++++++++++++++++
> > > > > gcc/testsuite/gcc.target/aarch64/pr116815-2.c | 93 ++++++++++++++
> > > > > gcc/testsuite/gcc.target/aarch64/pr116815-3.c | 62 +++++++++
> > > > > gcc/testsuite/gcc.target/aarch64/pr116815-4.c | 48 +++++++
> > > > > gcc/testsuite/gcc.target/aarch64/pr116815-5.c | 44 +++++++
> > > > > gcc/testsuite/gcc.target/aarch64/pr116815-6.c | 60 +++++++++
> > > > > 8 files changed, 512 insertions(+)
> > > > > create mode 100644 gcc/testsuite/gcc.target/aarch64/pr116815-1.c
> > > > > create mode 100644 gcc/testsuite/gcc.target/aarch64/pr116815-2.c
> > > > > create mode 100644 gcc/testsuite/gcc.target/aarch64/pr116815-3.c
> > > > > create mode 100644 gcc/testsuite/gcc.target/aarch64/pr116815-4.c
> > > > > create mode 100644 gcc/testsuite/gcc.target/aarch64/pr116815-5.c
> > > > > create mode 100644 gcc/testsuite/gcc.target/aarch64/pr116815-6.c
> > > > >
> > > > > diff --git a/gcc/config/aarch64/aarch64.md
> > > > > b/gcc/config/aarch64/aarch64.md
> > > > > index 8e431a10cb1..a55fe5c85d8 100644
> > > > > --- a/gcc/config/aarch64/aarch64.md
> > > > > +++ b/gcc/config/aarch64/aarch64.md
> > > > > @@ -3715,6 +3715,20 @@
> > > > > [(set_attr "type" "alus_sreg")]
> > > > > )
> > > > >
> > > > > +;; An equivalent to add<mode>3_compareC
> > > > > +(define_insn "sub<mode>3_compareC"
> > > > > + [(set (reg:CC_C CC_REGNUM)
> > > > > + (compare:CC_C
> > > > > + (minus:GPI
> > > > > + (match_operand:GPI 1 "register_operand" "r")
> > > > > + (match_operand:GPI 2 "register_operand" "r"))
> > > > > + (match_dup 1)))
> > > > > + (set (match_operand:GPI 0 "register_operand" "=r")
> > > > > + (minus:GPI (match_dup 1) (match_dup 2)))]
> > > > > + ""
> > > > > + "subs\t%<w>0, %<w>1, %<w>2"
> > > > > +)
> > > > > +
> > > >
> > > > I don't think we want/need this pattern. subs(a,b) underflows precisely
> > > > when a < b, i.e. when a - b < 0 so you can just use a plain CCmode
> > > > comparison. I think you can make use of the existing
> > > > sub<mode>3_compare1 pattern here.
> > > >
> > > > This should simplify the define_insn_and_split for the subs case below.
> > > >
> > > > > (define_peephole2
> > > > > [(set (match_operand:GPI 0 "aarch64_general_reg")
> > > > > (minus:GPI (match_operand:GPI 1 "aarch64_reg_or_zero")
> > > > > @@ -4455,6 +4469,68 @@
> > > > > [(set_attr "type" "<su>div")]
> > > > > )
> > > > >
> > > > > +;; umax (a, add (a, b)) => [sum, ovf] = adds (a, b); !ovf ? sum : a
> > > > > +;; umin (a, add (a, b)) => [sum, ovf] = adds (a, b); !ovf ? a : sum
> > > > > +;; ... along with the commutative version of add (a, b) i.e. add (b,
> > > > > a).
> > > >
> > > > For the sake of saving one comment line, it might be better to actually
> > > > spell out the commutated variant, AIUI e.g. for umax that's:
> > > > umax (b, add (a, b)) => [sum, ovf] = adds (b, a); !ovf ? sum : b
> > >
> > > Done. I also added this to the patch description.
> > >
> > > >
> > > > > +(define_insn_and_split
> > > > > "*aarch64_plus_within_<optab><mode>3_<ovf_commutate>"
> > > > > + [(set (match_operand:GPI 0 "register_operand" "=r")
> > > > > + (UMAXMIN:GPI
> > > > > + (plus:GPI (match_operand:GPI 1 "register_operand" "r")
> > > > > + (match_operand:GPI 2 "register_operand" "r"))
> > > > > + (match_dup ovf_commutate)))
> > > > > + (clobber (match_scratch:GPI 3))]
> > > >
> > > > I think you might want an "=r" constraint on the match_scratch here.
> > >
> > > I am a bit confused here - I expect this pattern to only execute pre-RA
> > > because it requires being able to call gen_reg_rtx (which only works
> > > pre-RA). Does RA populate scratch registers by itself? Otherwise, I
> > > don't really see any other way for this split to run post-RA.
> >
> > So having done some digging on this, the point of this idiom:
> >
> > > + if (GET_CODE (operands[3]) == SCRATCH)
> > > + operands[3] = gen_reg_rtx (<MODE>mode);
> >
> > is to allow the splitter to work both before and after RA. Indeed, the
> > RA will allocate hard registers in place of scratch registers, provided
> > there is a suitable constraint on the operand.
> >
> > I think whatever we do the split condition should be "&& 1"; as Uros
> > pointed out, allowing the insn to match after RA but not allowing the
> > insn to be split is an ICE trap.
> >
> > We then need to decide whether to restrict the define_insn_and_splits to
> > pre-RA only, or allow them both before and after RA. I don't see a good
> > reason to forbid them from matching after RA, so I think the patch is OK
> > with the constraints on the scratch registers added, and the split
> > conditions changed to "&& 1". I've added comments to this effect below.
> >
> > Note that if we _were_ to restrict the pattern to pre-RA only, I agree
> > with all of Uros's comments, and we would have to define
> > aarch64_pre_reload_split and use that in the insn condition for the
> > define_insn_and_splits, but incrementally it seems easier to me to just
> > allow the pattern to work both before and after RA.
> >
> > >
> > > Uros: IIRC I ran into issues where unconditionally calling gen_reg_rtx
> > > still caused an ICE, maybe because the code ran during reload itself.
> > > I guess I could replace "!reload_completed" with "can_create_pseudo_p ()"?
> > >
> > > >
> > > > > + "!TARGET_CSSC"
> > > > > + "#"
> > > > > + "&& !reload_completed"
> > > > > + [(parallel
> > > > > + [(set (reg:CC_C CC_REGNUM)
> > > > > + (compare:CC_C (plus:GPI (match_dup ovf_commutate)
> > > > > + (match_dup <ovf_comm_opp>))
> > > > > + (match_dup ovf_commutate)))
> > > > > + (set (match_dup 3) (plus:GPI (match_dup ovf_commutate)
> > > > > + (match_dup <ovf_comm_opp>)))])
> > > > > + (set (match_dup 0)
> > > > > + (if_then_else:GPI (<ovf_add_cmp> (reg:CC_C CC_REGNUM)
> > > > > + (const_int 0))
> > > > > + (match_dup 3)
> > > > > + (match_dup ovf_commutate)))]
> > > > > + {
> > > > > + if (GET_CODE (operands[3]) == SCRATCH)
> > > > > + operands[3] = gen_reg_rtx (<MODE>mode);
> > > >
> > > > At first I was a bit unsure about this idiom, since md.texi:9358 has
> > > > this to say about define_split (and therefore about
> > > > define_insn_and_split):
> > > >
> > > > The @var{preparation-statements} are similar to those statements
> > > > that
> > > > are specified for @code{define_expand} (@pxref{Expander
> > > > Definitions})
> > > > and are executed before the new RTL is generated to prepare for
> > > > the
> > > > generated code or emit some insns whose pattern is not fixed.
> > > > Unlike
> > > > those in @code{define_expand}, however, these statements must not
> > > > generate any new pseudo-registers. Once reload has completed,
> > > > they also
> > > > must not allocate any space in the stack frame.
> > > >
> > > > but I see that we already use this idiom for a few patterns in
> > > > aarch64-sve.md.
> > > > Would appreciate a second opinion on this. Either the existing SVE
> > > > patterns are
> > > > wrong or (perhaps more likely) the md.texi docs are out of date.
> > > >
> > > > > + }
> > > > > +)
> > > > > +
> > > > > +;; In this case the ovf represents an underflow.
> > > > > +;; umax (a, sub (a, b)) => [diff, ovf] = subs (a, b); ovf ? diff : a
> > > > > +;; umin (a, sub (a, b)) => [diff, ovf] = subs (a, b); ovf ? a : diff
> > > > > +(define_insn_and_split "*aarch64_minus_within_<optab><mode>3"
> > > > > + [(set (match_operand:GPI 0 "register_operand" "=r")
> > > > > + (UMAXMIN:GPI
> > > > > + (minus:GPI (match_operand:GPI 1 "register_operand" "r")
> > > > > + (match_operand:GPI 2 "register_operand" "r"))
> > > > > + (match_dup 1)))
> > > > > + (clobber (match_scratch:GPI 3))]
> > > > > + "!TARGET_CSSC"
> > > > > + "#"
> > > > > + "&& !reload_completed"
> > > > > + [(parallel
> > > > > + [(set (reg:CC_C CC_REGNUM)
> > > > > + (compare:CC_C (minus:GPI (match_dup 1) (match_dup 2))
> > > > > + (match_dup 1)))
> > > > > + (set (match_dup 3) (minus:GPI (match_dup 1) (match_dup 2)))])
> > > > > + ;; There is an inverse mapping here from CC_C -> CC as this
> > > > > requires the
> > > > > + ;; inverse of the comparison from the above pattern.
> > > >
> > > > As mentioned above you should be able to do the whole thing in CCmode,
> > > > which
> > > > should simplify things here.
> > > >
> > > > > + (set (match_dup 0)
> > > > > + (if_then_else:GPI (<ovf_sub_cmp> (reg:CC CC_REGNUM)
> > > > > + (const_int 0))
> > > > > + (match_dup 3)
> > > > > + (match_dup 1)))]
> > > > > + {
> > > > > + if (GET_CODE (operands[3]) == SCRATCH)
> > > > > + operands[3] = gen_reg_rtx (<MODE>mode);
> > > > > + }
> > > > > +)
> > > > > +
> > > > > ;;
> > > > > -------------------------------------------------------------------
> > > > > ;; Comparison insns
> > > > > ;;
> > > > > -------------------------------------------------------------------
> > > > > diff --git a/gcc/config/aarch64/iterators.md
> > > > > b/gcc/config/aarch64/iterators.md
> > > > > index c3771d9402b..a29243247bd 100644
> > > > > --- a/gcc/config/aarch64/iterators.md
> > > > > +++ b/gcc/config/aarch64/iterators.md
> > > > > @@ -2804,6 +2804,8 @@
> > > > >
> > > > > (define_code_iterator FMAXMIN [smax smin])
> > > > >
> > > > > +(define_code_iterator UMAXMIN [umax umin])
> > > > > +
> > > > > ;; Signed and unsigned max operations.
> > > > > (define_code_iterator USMAX [smax umax])
> > > > >
> > > > > @@ -3092,6 +3094,9 @@
> > > > >
> > > > > (define_code_attr maxminand [(smax "bic") (smin "and")])
> > > > >
> > > > > +(define_code_attr ovf_add_cmp [(umax "geu") (umin "ltu")])
> > > > > +(define_code_attr ovf_sub_cmp [(umax "ltu") (umin "geu")])
> > > > > +
> > > > > ;; MLA/MLS attributes.
> > > > > (define_code_attr as [(ss_plus "a") (ss_minus "s")])
> > > > >
> > > > > @@ -5007,3 +5012,7 @@
> > > > > (UNSPEC_F2CVT "f2cvt")
> > > > > (UNSPEC_F1CVTLT "f1cvtlt")
> > > > > (UNSPEC_F2CVTLT "f2cvtlt")])
> > > > > +
> > > > > +;; Operand numbers for commutative operations
> > > > > +(define_int_iterator ovf_commutate [1 2])
> > > >
> > > > Note that this iterator already exists with the name BSL_DUP. It might
> > > > be better to rename it to something more general and re-purpose it here,
> > > > rather than duplicating it.
> > >
> > > Yeah, I saw that as well - using BSL_DUP and bsl_mov would achieve the
> > > same
> > > purpose, but I am not sure what commonality between my usage and BSL can
> > > be
> > > used for naming? ONE_TWO / two_one are the obvious names, but they don't
> > > seem
> > > very good.
> >
> > Yeah, those are the names that came to mind for me, too, but it's
> > arguably less readable in context, so IMO it's OK to duplicate the
> > iterator for the sake of better readability (as the current patch does),
> > if you prefer.
> >
> > >
> > > >
> > > > > +(define_int_attr ovf_comm_opp [(1 "2") (2 "1")])
> > > > > diff --git a/gcc/testsuite/gcc.target/aarch64/pr116815-1.c
> > > > > b/gcc/testsuite/gcc.target/aarch64/pr116815-1.c
> > > > > new file mode 100644
> > > > > index 00000000000..f3bdb794318
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.target/aarch64/pr116815-1.c
> > > > > @@ -0,0 +1,120 @@
> > > > > +/* { dg-do compile } */
> > > > > +/* { dg-options "-O2" } */
> > > > > +/* { dg-final { check-function-bodies "**" "" "" } } */
> > > > > +
> > > > > +/* PR middle-end/116815 */
> > > > > +
> > > > > +/* Single-use tests. */
> > > > > +
> > > > > +static inline unsigned __attribute__ ((always_inline))
> > > > > +max (unsigned a, unsigned b)
> > > > > +{
> > > > > + return a > b ? a : b;
> > > > > +}
> > > > > +
> > > > > +static inline unsigned __attribute__ ((always_inline))
> > > > > +min (unsigned a, unsigned b)
> > > > > +{
> > > > > + return a < b ? a : b;
> > > > > +}
> > > > > +
> > > > > +#define OPERATION(op, type, N, exp1, exp2)
> > > > > \
> > > > > + unsigned u##op##type##N (unsigned a, unsigned b) { return op
> > > > > (exp1, exp2); }
> > > > > +
> > > > > +/*
> > > > > +** umaxadd1:
> > > > > +** adds (w[0-9]+), w0, w1
> > > > > +** csel w0, \1, w0, cc
> > > > > +** ret
> > > > > +*/
> > > > > +OPERATION (max, add, 1, a, a + b)
> > > > > +
> > > > > +/*
> > > > > +** umaxadd2:
> > > > > +** adds (w[0-9]+), w0, w1
> > > > > +** csel w0, \1, w0, cc
> > > > > +** ret
> > > > > +*/
> > > > > +OPERATION (max, add, 2, a, b + a)
> > > > > +
> > > > > +/*
> > > > > +** umaxadd3:
> > > > > +** adds (w[0-9]+), w0, w1
> > > > > +** csel w0, \1, w0, cc
> > > > > +** ret
> > > > > +*/
> > > > > +OPERATION (max, add, 3, a + b, a)
> > > > > +
> > > > > +/*
> > > > > +** umaxadd4:
> > > > > +** adds (w[0-9]+), w0, w1
> > > > > +** csel w0, \1, w0, cc
> > > > > +** ret
> > > > > +*/
> > > > > +OPERATION (max, add, 4, b + a, a)
> > > > > +
> > > > > +/*
> > > > > +** uminadd1:
> > > > > +** adds (w[0-9]+), w0, w1
> > > > > +** csel w0, \1, w0, cs
> > > > > +** ret
> > > > > +*/
> > > > > +OPERATION (min, add, 1, a, a + b)
> > > > > +
> > > > > +/*
> > > > > +** uminadd2:
> > > > > +** adds (w[0-9]+), w0, w1
> > > > > +** csel w0, \1, w0, cs
> > > > > +** ret
> > > > > +*/
> > > > > +OPERATION (min, add, 2, a, b + a)
> > > > > +
> > > > > +/*
> > > > > +** uminadd3:
> > > > > +** adds (w[0-9]+), w0, w1
> > > > > +** csel w0, \1, w0, cs
> > > > > +** ret
> > > > > +*/
> > > > > +OPERATION (min, add, 3, a + b, a)
> > > > > +
> > > > > +/*
> > > > > +** uminadd4:
> > > > > +** adds (w[0-9]+), w0, w1
> > > > > +** csel w0, \1, w0, cs
> > > > > +** ret
> > > > > +*/
> > > > > +OPERATION (min, add, 4, b + a, a)
> > > > > +
> > > > > +/* sub requires the inverse of the comparison from add. */
> > > > > +
> > > > > +/*
> > > > > +** umaxsub1:
> > > > > +** subs (w[0-9]+), w0, w1
> > > > > +** csel w0, \1, w0, cc
> > > > > +** ret
> > > > > +*/
> > > > > +OPERATION (max, sub, 1, a, a - b)
> > > > > +
> > > > > +/*
> > > > > +** umaxsub2:
> > > > > +** subs (w[0-9]+), w0, w1
> > > > > +** csel w0, \1, w0, cc
> > > > > +** ret
> > > > > +*/
> > > > > +OPERATION (max, sub, 2, a - b, a)
> > > > > +
> > > > > +/*
> > > > > +** uminsub1:
> > > > > +** subs (w[0-9]+), w0, w1
> > > > > +** csel w0, \1, w0, cs
> > > > > +** ret
> > > > > +*/
> > > > > +OPERATION (min, sub, 1, a, a - b)
> > > > > +
> > > > > +/*
> > > > > +** uminsub2:
> > > > > +** subs (w[0-9]+), w0, w1
> > > > > +** csel w0, \1, w0, cs
> > > > > +** ret
> > > > > +*/
> > > > > +OPERATION (min, sub, 2, a - b, a)
> > > > > diff --git a/gcc/testsuite/gcc.target/aarch64/pr116815-2.c
> > > > > b/gcc/testsuite/gcc.target/aarch64/pr116815-2.c
> > > > > new file mode 100644
> > > > > index 00000000000..f2dd7b0bbb3
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.target/aarch64/pr116815-2.c
> > > > > @@ -0,0 +1,93 @@
> > > > > +/* { dg-do compile } */
> > > > > +/* { dg-options "-O2" } */
> > > > > +
> > > > > +/* PR middle-end/116815 */
> > > > > +
> > > > > +/* Negative tests. */
> > > >
> > > > I don't see the benefit in including negative tests here, the tests will
> > > > get optimized out by the middle-end anyway due to C overflow rules.
> > >
> > > Sure. I have removed this testcase in the new revision.
> > >
> > > >
> > > > > +
> > > > > +static inline int __attribute__ ((always_inline))
> > > > > +smax (int a, int b)
> > > > > +{
> > > > > + return a > b ? a : b;
> > > > > +}
> > > > > +
> > > > > +static inline int __attribute__ ((always_inline))
> > > > > +smin (int a, int b)
> > > > > +{
> > > > > + return a < b ? a : b;
> > > > > +}
> > > > > +
> > > > > +static inline unsigned __attribute__ ((always_inline))
> > > > > +umax (unsigned a, unsigned b)
> > > > > +{
> > > > > + return a > b ? a : b;
> > > > > +}
> > > > > +
> > > > > +static inline unsigned __attribute__ ((always_inline))
> > > > > +umin (unsigned a, unsigned b)
> > > > > +{
> > > > > + return a < b ? a : b;
> > > > > +}
> > > > > +
> > > > > +#define ASSUME(cond) if (!(cond)) __builtin_unreachable ();
> > > > > +
> > > > > +/* This transformation does not trigger on signed types. */
> > > > > +
> > > > > +int
> > > > > +smax_add (int a, int b)
> > > > > +{
> > > > > + ASSUME (b >= 0);
> > > > > + return smax (a, a + b);
> > > > > +}
> > > > > +
> > > > > +int
> > > > > +smin_add (int a, int b)
> > > > > +{
> > > > > + ASSUME (b >= 0);
> > > > > + return smin (a, a + b);
> > > > > +}
> > > > > +
> > > > > +int
> > > > > +smax_sub (int a, int b)
> > > > > +{
> > > > > + ASSUME (b >= 0);
> > > > > + return smax (a, a - b);
> > > > > +}
> > > > > +
> > > > > +int
> > > > > +smin_sub (int a, int b)
> > > > > +{
> > > > > + ASSUME (b >= 0);
> > > > > + return smin (a, a - b);
> > > > > +}
> > > > > +
> > > > > +/* Invalid patterns. */
> > > > > +
> > > > > +/* This can potentially be matched, but the RHS gets factored to
> > > > > + (a + b) * b. */
> > > > > +unsigned
> > > > > +umax_factored (unsigned a, unsigned b)
> > > > > +{
> > > > > + return umax (a * b, a * b + b * b);
> > > > > +}
> > > > > +
> > > > > +unsigned
> > > > > +umin_mult (unsigned a, unsigned b)
> > > > > +{
> > > > > + return umin (a, a * b);
> > > > > +}
> > > > > +
> > > > > +unsigned
> > > > > +umax_sub (unsigned a, unsigned b)
> > > > > +{
> > > > > + return umax (a, b - a);
> > > > > +}
> > > > > +
> > > > > +unsigned
> > > > > +umin_sub (unsigned a, unsigned b)
> > > > > +{
> > > > > + return umin (a, b - a);
> > > > > +}
> > > > > +
> > > > > +/* { dg-final { scan-assembler-not "adds\\t" } } */
> > > > > +/* { dg-final { scan-assembler-not "subs\\t" } } */
> > > > > diff --git a/gcc/testsuite/gcc.target/aarch64/pr116815-3.c
> > > > > b/gcc/testsuite/gcc.target/aarch64/pr116815-3.c
> > > > > new file mode 100644
> > > > > index 00000000000..f8050377116
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.target/aarch64/pr116815-3.c
> > > > > @@ -0,0 +1,62 @@
> > > > > +/* { dg-do compile } */
> > > > > +/* { dg-options "-O2" } */
> > > > > +/* { dg-final { check-function-bodies "**" "" "" } } */
> > > > > +
> > > > > +/* PR middle-end/116815 */
> > > > > +
> > > > > +/* Multi-use tests. */
> > > > > +
> > > > > +static inline unsigned __attribute__ ((always_inline))
> > > > > +max (unsigned a, unsigned b)
> > > > > +{
> > > > > + return a > b ? a : b;
> > > > > +}
> > > > > +
> > > > > +static inline unsigned __attribute__ ((always_inline))
> > > > > +min (unsigned a, unsigned b)
> > > > > +{
> > > > > + return a < b ? a : b;
> > > > > +}
> > > > > +
> > > > > +/* FIXME: This should only generate one adds. */
> > > >
> > > > I appreciate you raising the shortcomings of the patch, but perhaps
> > > > better just
> > > > to mention it in the cover letter rather than adding a test that
> > > > explicitly
> > > > defends the suboptimal codegen. AIUI, check-function-bodies tests are
> > > > intended
> > > > for when we're confident there is one optimal sequence of codegen that
> > > > we want
> > > > to generate.
> > >
> > > I have added the FIXME as a comment in the commit message, does that seem
> > > okay?
> >
> > Yep, that's fine.
> >
> > > This testcase is actually a leftover from when the patch was a GIMPLE
> > > transform,
> > > because this used to work before and is a regression after switching to a
> > > target-specific transform. I have removed this testcase now.
> >
> > I see, thanks.
> >
> > <snip>
> >
> > > From 588dca91b719168f8ff98b7e8ee4b395bccc7cda Mon Sep 17 00:00:00 2001
> > > From: Dhruv Chawla <[email protected]>
> > > Date: Wed, 23 Jul 2025 01:41:51 -0700
> > > Subject: [PATCH] [aarch64] Make better use of overflowing operations in
> > > max/min(a, add/sub(a, b)) [PR116815]
> > >
> > > This patch folds the following patterns:
> > > - For add:
> > > - umax (a, add (a, b)) -> [sum, ovf] = adds (a, b); !ovf ? sum : a
> > > - umin (a, add (a, b)) -> [sum, ovf] = adds (a, b); !ovf ? a : sum
> > > ... along with the commutated versions:
> > > - umax (a, add (b, a)) -> [sum, ovf] = adds (b, a); !ovf ? sum : a
> > > - umin (a, add (b, a)) -> [sum, ovf] = adds (b, a); !ovf ? a : sum
> > > - For sub:
> > > - umax (a, sub (a, b)) -> [diff, udf] = subs (a, b); udf ? diff : a
> > > - umin (a, sub (a, b)) -> [diff, udf] = subs (a, b); udf ? a : diff
> > >
> > > Where ovf is the overflow flag and udf is the underflow flag. adds and
> > > subs are
> > > generated by generating parallel compare+plus/minus which map to
> > > add<mode>3_compareC and sub<mode>3_compare1.
> > >
> > > This patch is a respin of the patch posted at
> > > https://gcc.gnu.org/pipermail/gcc-patches/2025-May/685021.html as per
> > > the suggestion to turn it into a target-specific transform by Richard
> > > Biener.
> > >
> > > FIXME: This pattern cannot currently factor multiple occurences of the
> > > add expression into a single adds, eg: max (a, a + b) + min (a +
> > > b, b)
> > > ends up generating two adds instructions. This is something that
> > > was lost when going from GIMPLE to target-specific transforms.
> > >
> > > Bootstrapped and regtested on aarch64-unknown-linux-gnu.
> > >
> > > Signed-off-by: Dhruv Chawla <[email protected]>
> > >
> > > PR middle-end/116815
> > >
> > > gcc/ChangeLog:
> > >
> > > * config/aarch64/aarch64.md
> > > (*aarch64_plus_within_<optab><mode>3_<ovf_commutate>): New pattern.
> > > (*aarch64_minus_within_<optab><mode>3): Likewise.
> > > * config/aarch64/iterators.md (ovf_add_cmp): New code attribute.
> > > (udf_sub_cmp): Likewise.
> > > (UMAXMIN): New code iterator.
> > > (ovf_commutate): New iterator.
> > > (ovf_comm_opp): New int attribute.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > * gcc.target/aarch64/pr116815-1.c: New test.
> > > * gcc.target/aarch64/pr116815-2.c: Likewise.
> > > * gcc.target/aarch64/pr116815-3.c: Likewise.
> > > * gcc.target/aarch64/pr116815-4.c: Likewise.
> > > ---
> > > gcc/config/aarch64/aarch64.md | 60 +++++++++
> > > gcc/config/aarch64/iterators.md | 9 ++
> > > gcc/testsuite/gcc.target/aarch64/pr116815-1.c | 120 ++++++++++++++++++
> > > gcc/testsuite/gcc.target/aarch64/pr116815-2.c | 48 +++++++
> > > gcc/testsuite/gcc.target/aarch64/pr116815-3.c | 44 +++++++
> > > gcc/testsuite/gcc.target/aarch64/pr116815-4.c | 60 +++++++++
> > > 6 files changed, 341 insertions(+)
> > > create mode 100644 gcc/testsuite/gcc.target/aarch64/pr116815-1.c
> > > create mode 100644 gcc/testsuite/gcc.target/aarch64/pr116815-2.c
> > > create mode 100644 gcc/testsuite/gcc.target/aarch64/pr116815-3.c
> > > create mode 100644 gcc/testsuite/gcc.target/aarch64/pr116815-4.c
> > >
> > > diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> > > index 98c65a74c8e..307df4e92ba 100644
> > > --- a/gcc/config/aarch64/aarch64.md
> > > +++ b/gcc/config/aarch64/aarch64.md
> > > @@ -4488,6 +4488,66 @@
> > > [(set_attr "type" "<su>div")]
> > > )
> > > +;; umax (a, add (a, b)) => [sum, ovf] = adds (a, b); !ovf ? sum : a
> > > +;; umin (a, add (a, b)) => [sum, ovf] = adds (a, b); !ovf ? a : sum
> > > +;; ... and the commutated versions:
> > > +;; umax (a, add (b, a)) => [sum, ovf] = adds (b, a); !ovf ? sum : a
> > > +;; umin (a, add (b, a)) => [sum, ovf] = adds (b, a); !ovf ? a : sum
> > > +(define_insn_and_split
> > > "*aarch64_plus_within_<optab><mode>3_<ovf_commutate>"
> > > + [(set (match_operand:GPI 0 "register_operand" "=r")
> > > + (UMAXMIN:GPI
> > > + (plus:GPI (match_operand:GPI 1 "register_operand" "r")
> > > + (match_operand:GPI 2 "register_operand" "r"))
> > > + (match_dup ovf_commutate)))
> > > + (clobber (match_scratch:GPI 3))]
> >
> > Please add an "=r" constraint here.
> >
> > > + "!TARGET_CSSC"
> > > + "#"
> > > + "&& !reload_completed"
> >
> > This should be "&& 1".
> >
> > > + [(parallel
> > > + [(set (reg:CC_C CC_REGNUM)
> > > + (compare:CC_C (plus:GPI (match_dup ovf_commutate)
> > > + (match_dup <ovf_comm_opp>))
> > > + (match_dup ovf_commutate)))
> > > + (set (match_dup 3) (plus:GPI (match_dup ovf_commutate)
> > > + (match_dup <ovf_comm_opp>)))])
> > > + (set (match_dup 0)
> > > + (if_then_else:GPI (<ovf_add_cmp> (reg:CC_C CC_REGNUM)
> > > + (const_int 0))
> > > + (match_dup 3)
> > > + (match_dup ovf_commutate)))]
> > > + {
> > > + if (GET_CODE (operands[3]) == SCRATCH)
> > > + operands[3] = gen_reg_rtx (<MODE>mode);
> > > + }
> > > +)
> > > +
> > > +;; umax (a, sub (a, b)) => [diff, udf] = subs (a, b); udf ? diff : a
> > > +;; umin (a, sub (a, b)) => [diff, udf] = subs (a, b); udf ? a : diff
> > > +(define_insn_and_split "*aarch64_minus_within_<optab><mode>3"
> > > + [(set (match_operand:GPI 0 "register_operand" "=r")
> > > + (UMAXMIN:GPI
> > > + (minus:GPI (match_operand:GPI 1 "register_operand" "r")
> > > + (match_operand:GPI 2 "register_operand" "r"))
> > > + (match_dup 1)))
> > > + (clobber (match_scratch:GPI 3))]
> >
> > Please add an "=r" constraint here.
> >
> > > + "!TARGET_CSSC"
> > > + "#"
> > > + "&& !reload_completed"
> >
> > This should be "&& 1".
> >
> > > + [(parallel
> > > + [(set (reg:CC CC_REGNUM)
> > > + (compare:CC (match_dup 1) (match_dup 2)))
> > > + (set (match_dup 3) (minus:GPI (match_dup 1) (match_dup 2)))])
> > > + (set (match_dup 0)
> > > + (if_then_else:GPI (<udf_sub_cmp> (reg:CC CC_REGNUM)
> > > + (const_int 0))
> > > + (match_dup 3)
> > > + (match_dup 1)))]
> > > + {
> > > + if (GET_CODE (operands[3]) == SCRATCH)
> > > + operands[3] = gen_reg_rtx (<MODE>mode);
> > > + }
> > > +)
> > > +
> > > ;; -------------------------------------------------------------------
> > > ;; Comparison insns
> > > ;; -------------------------------------------------------------------
> > > diff --git a/gcc/config/aarch64/iterators.md
> > > b/gcc/config/aarch64/iterators.md
> > > index 517b2808b5f..74186cfae8c 100644
> > > --- a/gcc/config/aarch64/iterators.md
> > > +++ b/gcc/config/aarch64/iterators.md
> > > @@ -2827,6 +2827,8 @@
> > > (define_code_iterator FMAXMIN [smax smin])
> > > +(define_code_iterator UMAXMIN [umax umin])
> > > +
> > > ;; Signed and unsigned max operations.
> > > (define_code_iterator USMAX [smax umax])
> > > @@ -3115,6 +3117,9 @@
> > > (define_code_attr maxminand [(smax "bic") (smin "and")])
> > > +(define_code_attr ovf_add_cmp [(umax "geu") (umin "ltu")])
> > > +(define_code_attr udf_sub_cmp [(umax "ltu") (umin "geu")])
> > > +
> > > ;; MLA/MLS attributes.
> > > (define_code_attr as [(ss_plus "a") (ss_minus "s")])
> > > @@ -5126,3 +5131,7 @@
> > > (UNSPEC_F2CVT "f2cvt")
> > > (UNSPEC_F1CVTLT "f1cvtlt")
> > > (UNSPEC_F2CVTLT "f2cvtlt")])
> > > +
> > > +;; Operand numbers for commutative operations
> > > +(define_int_iterator ovf_commutate [1 2])
> > > +(define_int_attr ovf_comm_opp [(1 "2") (2 "1")])
> > > diff --git a/gcc/testsuite/gcc.target/aarch64/pr116815-1.c
> > > b/gcc/testsuite/gcc.target/aarch64/pr116815-1.c
> > > new file mode 100644
> > > index 00000000000..f3bdb794318
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/aarch64/pr116815-1.c
> > > @@ -0,0 +1,120 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-options "-O2" } */
> > > +/* { dg-final { check-function-bodies "**" "" "" } } */
> > > +
> > > +/* PR middle-end/116815 */
> > > +
> > > +/* Single-use tests. */
> > > +
> > > +static inline unsigned __attribute__ ((always_inline))
> > > +max (unsigned a, unsigned b)
> > > +{
> > > + return a > b ? a : b;
> > > +}
> > > +
> > > +static inline unsigned __attribute__ ((always_inline))
> > > +min (unsigned a, unsigned b)
> > > +{
> > > + return a < b ? a : b;
> > > +}
> > > +
> > > +#define OPERATION(op, type, N, exp1, exp2)
> > > \
> > > + unsigned u##op##type##N (unsigned a, unsigned b) { return op (exp1,
> > > exp2); }
> > > +
> > > +/*
> > > +** umaxadd1:
> > > +** adds (w[0-9]+), w0, w1
> > > +** csel w0, \1, w0, cc
> > > +** ret
> > > +*/
> > > +OPERATION (max, add, 1, a, a + b)
> > > +
> > > +/*
> > > +** umaxadd2:
> > > +** adds (w[0-9]+), w0, w1
> > > +** csel w0, \1, w0, cc
> > > +** ret
> > > +*/
> > > +OPERATION (max, add, 2, a, b + a)
> > > +
> > > +/*
> > > +** umaxadd3:
> > > +** adds (w[0-9]+), w0, w1
> > > +** csel w0, \1, w0, cc
> > > +** ret
> > > +*/
> > > +OPERATION (max, add, 3, a + b, a)
> > > +
> > > +/*
> > > +** umaxadd4:
> > > +** adds (w[0-9]+), w0, w1
> > > +** csel w0, \1, w0, cc
> > > +** ret
> > > +*/
> > > +OPERATION (max, add, 4, b + a, a)
> > > +
> > > +/*
> > > +** uminadd1:
> > > +** adds (w[0-9]+), w0, w1
> > > +** csel w0, \1, w0, cs
> > > +** ret
> > > +*/
> > > +OPERATION (min, add, 1, a, a + b)
> > > +
> > > +/*
> > > +** uminadd2:
> > > +** adds (w[0-9]+), w0, w1
> > > +** csel w0, \1, w0, cs
> > > +** ret
> > > +*/
> > > +OPERATION (min, add, 2, a, b + a)
> > > +
> > > +/*
> > > +** uminadd3:
> > > +** adds (w[0-9]+), w0, w1
> > > +** csel w0, \1, w0, cs
> > > +** ret
> > > +*/
> > > +OPERATION (min, add, 3, a + b, a)
> > > +
> > > +/*
> > > +** uminadd4:
> > > +** adds (w[0-9]+), w0, w1
> > > +** csel w0, \1, w0, cs
> > > +** ret
> > > +*/
> > > +OPERATION (min, add, 4, b + a, a)
> > > +
> > > +/* sub requires the inverse of the comparison from add. */
> > > +
> > > +/*
> > > +** umaxsub1:
> > > +** subs (w[0-9]+), w0, w1
> > > +** csel w0, \1, w0, cc
> > > +** ret
> > > +*/
> > > +OPERATION (max, sub, 1, a, a - b)
> > > +
> > > +/*
> > > +** umaxsub2:
> > > +** subs (w[0-9]+), w0, w1
> > > +** csel w0, \1, w0, cc
> > > +** ret
> > > +*/
> > > +OPERATION (max, sub, 2, a - b, a)
> > > +
> > > +/*
> > > +** uminsub1:
> > > +** subs (w[0-9]+), w0, w1
> > > +** csel w0, \1, w0, cs
> > > +** ret
> > > +*/
> > > +OPERATION (min, sub, 1, a, a - b)
> > > +
> > > +/*
> > > +** uminsub2:
> > > +** subs (w[0-9]+), w0, w1
> > > +** csel w0, \1, w0, cs
> > > +** ret
> > > +*/
> > > +OPERATION (min, sub, 2, a - b, a)
> > > diff --git a/gcc/testsuite/gcc.target/aarch64/pr116815-2.c
> > > b/gcc/testsuite/gcc.target/aarch64/pr116815-2.c
> > > new file mode 100644
> > > index 00000000000..eb1fdb94797
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/aarch64/pr116815-2.c
> > > @@ -0,0 +1,48 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-options "-O2" } */
> > > +
> > > +/* PR middle-end/116815 */
> > > +
> > > +/* Multi-use testcase across basic blocks. */
> > > +
> > > +static inline unsigned __attribute__ ((always_inline))
> > > +max (unsigned a, unsigned b)
> > > +{
> > > + return a > b ? a : b;
> > > +}
> > > +
> > > +static inline unsigned __attribute__ ((always_inline))
> > > +min (unsigned a, unsigned b)
> > > +{
> > > + return a < b ? a : b;
> > > +}
> > > +
> > > +#define OPERATION(op, type, N, exp1, exp2)
> > > \
> > > + unsigned u##op##type##N (unsigned a, unsigned b, unsigned c, unsigned
> > > d, \
> > > + unsigned e)
> > > \
> > > + {
> > > \
> > > + unsigned result = op (exp1, exp2);
> > > \
> > > + if (result == c || result == c * 2)
> > > \
> > > + return d;
> > > \
> > > + else
> > > \
> > > + return e;
> > > \
> > > + }
> >
> > What is the purpose/intent of this test? IMO it seems superfluous.
> >
> > The patch is OK without it and with the above changes to the patterns.
>
> Hi Alex,
>
> Thanks for the review! I have applied the changes and committed as
> 3c378398111f7fc3c026b705e3ac088b27d4c307.
Great stuff! Just a note to make sure to add "aarch64:" at the start of
the commit message for aarch64-specific patches in future; makes it
easier to see the scope of a commit at a glance (there should always be
some component at the start of the commit message).
Thanks for working on this.
Alex
>
> --
> Regards,
> Dhruv
> >
> > Thanks!
> > Alex
> >
> > > +
> > > +OPERATION (max, add, 1, a, a + b)
> > > +OPERATION (max, add, 2, a, b + a)
> > > +OPERATION (max, add, 3, a + b, a)
> > > +OPERATION (max, add, 4, b + a, a)
> > > +
> > > +OPERATION (min, add, 1, a, a + b)
> > > +OPERATION (min, add, 2, a, b + a)
> > > +OPERATION (min, add, 3, a + b, a)
> > > +OPERATION (min, add, 4, b + a, a)
> > > +
> > > +OPERATION (max, sub, 1, a, a - b)
> > > +OPERATION (max, sub, 2, a - b, a)
> > > +
> > > +OPERATION (min, sub, 1, a, a - b)
> > > +OPERATION (min, sub, 2, a - b, a)
> > > +
> > > +/* { dg-final { scan-assembler-times "adds\\t" 8 } } */
> > > +/* { dg-final { scan-assembler-times "subs\\t" 4 } } */
> > > diff --git a/gcc/testsuite/gcc.target/aarch64/pr116815-3.c
> > > b/gcc/testsuite/gcc.target/aarch64/pr116815-3.c
> > > new file mode 100644
> > > index 00000000000..015c868aec2
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/aarch64/pr116815-3.c
> > > @@ -0,0 +1,44 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-options "-O2" } */
> > > +
> > > +#pragma GCC target "+cssc"
> > > +
> > > +/* PR middle-end/116815 */
> > > +
> > > +/* Make sure that umax/umin instructions are generated with CSSC. */
> > > +
> > > +static inline unsigned __attribute__ ((always_inline))
> > > +max (unsigned a, unsigned b)
> > > +{
> > > + return a > b ? a : b;
> > > +}
> > > +
> > > +static inline unsigned __attribute__ ((always_inline))
> > > +min (unsigned a, unsigned b)
> > > +{
> > > + return a < b ? a : b;
> > > +}
> > > +
> > > +#define OPERATION(op, type, N, exp1, exp2)
> > > \
> > > + unsigned u##op##type##N (unsigned a, unsigned b) { return op (exp1,
> > > exp2); }
> > > +
> > > +OPERATION (max, add, 1, a, a + b)
> > > +OPERATION (max, add, 2, a, b + a)
> > > +OPERATION (max, add, 3, a + b, a)
> > > +OPERATION (max, add, 4, b + a, a)
> > > +
> > > +OPERATION (min, add, 1, a, a + b)
> > > +OPERATION (min, add, 2, a, b + a)
> > > +OPERATION (min, add, 3, a + b, a)
> > > +OPERATION (min, add, 4, b + a, a)
> > > +
> > > +OPERATION (max, sub, 1, a, a - b)
> > > +OPERATION (max, sub, 2, a - b, a)
> > > +
> > > +OPERATION (min, sub, 1, a, a - b)
> > > +OPERATION (min, sub, 2, a - b, a)
> > > +
> > > +/* { dg-final { scan-assembler-times "umax\\t" 6 } } */
> > > +/* { dg-final { scan-assembler-times "umin\\t" 6 } } */
> > > +/* { dg-final { scan-assembler-not "adds\\t" } } */
> > > +/* { dg-final { scan-assembler-not "subs\\t" } } */
> > > diff --git a/gcc/testsuite/gcc.target/aarch64/pr116815-4.c
> > > b/gcc/testsuite/gcc.target/aarch64/pr116815-4.c
> > > new file mode 100644
> > > index 00000000000..d262d2170f3
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/aarch64/pr116815-4.c
> > > @@ -0,0 +1,60 @@
> > > +/* { dg-do run } */
> > > +/* { dg-options "-O2" } */
> > > +
> > > +/* PR middle-end/116815 */
> > > +
> > > +/* Verify that the transformation gives correct results */
> > > +
> > > +static inline unsigned __attribute__ ((always_inline))
> > > +min (unsigned a, unsigned b)
> > > +{
> > > + return (a < b) ? a : b;
> > > +}
> > > +
> > > +static inline unsigned __attribute__ ((always_inline))
> > > +max (unsigned a, unsigned b)
> > > +{
> > > + return (a > b) ? a : b;
> > > +}
> > > +
> > > +__attribute__ ((noipa)) unsigned
> > > +umaxadd (unsigned a, unsigned b)
> > > +{
> > > + return max (a + b, a);
> > > +}
> > > +
> > > +__attribute__ ((noipa)) unsigned
> > > +umaxsub (unsigned a, unsigned b)
> > > +{
> > > + return max (a - b, a);
> > > +}
> > > +
> > > +__attribute__ ((noipa)) unsigned
> > > +uminadd (unsigned a, unsigned b)
> > > +{
> > > + return min (a + b, a);
> > > +}
> > > +
> > > +__attribute__ ((noipa)) unsigned
> > > +uminsub (unsigned a, unsigned b)
> > > +{
> > > + return min (a - b, a);
> > > +}
> > > +
> > > +int
> > > +main ()
> > > +{
> > > + /* Overflows to 0x30000000. */
> > > + if (umaxadd (0x90000000, 0xa0000000) != 0x90000000)
> > > + __builtin_abort ();
> > > +
> > > + if (uminadd (0x90000000, 0xa0000000) != 0x30000000)
> > > + __builtin_abort ();
> > > +
> > > + /* Underflows to 0x60000000. */
> > > + if (umaxsub (0x00000000, 0xa0000000) != 0x60000000)
> > > + __builtin_abort ();
> > > +
> > > + if (uminsub (0x00000000, 0xa0000000) != 0x00000000)
> > > + __builtin_abort ();
> > > +}
> > > --
> > > 2.44.0