sub(a, b)) [PR116815]

Dhruv Chawla Tue, 25 Nov 2025 22:22:19 -0800

On 25/11/25 19:58, Alex Coplan wrote:

External email: Use caution opening links or attachments



On 05/11/2025 10:39, Dhruv Chawla wrote:

On 28/10/25 22:03, Alex Coplan wrote:

External email: Use caution opening links or attachments


Hi Dhruv,

Sorry for the long wait on this one.  Comments below ...


Hi Alex,

Thanks for the review. I have attached a new version of the patch
to this email.


Thanks, and sorry to be slow again...


On 18/08/2025 21:31, [email protected] wrote:

From: Dhruv Chawla <[email protected]>

This patch folds the following patterns:
- umax (a, add (a, b)) -> [sum,  ovf] = adds (a, b); !ovf ? sum : a
- umin (a, add (a, b)) -> [sum,  ovf] = adds (a, b); !ovf ? a : sum
- umax (a, sub (a, b)) -> [diff, ovf] = subs (a, b); ovf ? diff : a
- umin (a, sub (a, b)) -> [diff, ovf] = subs (a, b); ovf ? a : diff

Where ovf is the overflow flag. adds and subs are generated by
generating a parallel compare+plus/minus which maps to the pattern
add<mode>3_compareC. sub<mode>3_compareC is also created to have an
equivalent pattern for the subs instruction. In the case of subs, the
overflow flag represents underflow.

This patch is a respin of the patch posted at
https://gcc.gnu.org/pipermail/gcc-patches/2025-May/685021.html as per
the suggestion to turn it into a target-specific transform by Richard
Biener.

Bootstrapped and regtested on aarch64-unknown-linux-gnu.

Signed-off-by: Dhruv Chawla <[email protected]>

        PR middle-end/116815

gcc/ChangeLog:

        * config/aarch64/aarch64.md (sub<mode>3_compareC): New pattern.
        (*aarch64_plus_within_<optab><mode>3_<ovf_commutate>): Likewise.
        (*aarch64_minus_within_<optab><mode>3): Likewise.
        * config/aarch64/iterators.md (ovf_add_cmp): New code attribute.
        (ovf_sub_cmp): Likewise.
        (ovf_commutate): New iterator.
        (ovf_comm_opp): New int attribute.

gcc/testsuite/ChangeLog:

        * gcc.target/aarch64/pr116815-1.c: New test.
        * gcc.target/aarch64/pr116815-2.c: Likewise.
        * gcc.target/aarch64/pr116815-3.c: Likewise.
        * gcc.target/aarch64/pr116815-4.c: Likewise.
        * gcc.target/aarch64/pr116815-5.c: Likewise.
        * gcc.target/aarch64/pr116815-6.c: Likewise.
---
   gcc/config/aarch64/aarch64.md                 |  76 +++++++++++
   gcc/config/aarch64/iterators.md               |   9 ++
   gcc/testsuite/gcc.target/aarch64/pr116815-1.c | 120 ++++++++++++++++++
   gcc/testsuite/gcc.target/aarch64/pr116815-2.c |  93 ++++++++++++++
   gcc/testsuite/gcc.target/aarch64/pr116815-3.c |  62 +++++++++
   gcc/testsuite/gcc.target/aarch64/pr116815-4.c |  48 +++++++
   gcc/testsuite/gcc.target/aarch64/pr116815-5.c |  44 +++++++
   gcc/testsuite/gcc.target/aarch64/pr116815-6.c |  60 +++++++++
   8 files changed, 512 insertions(+)
   create mode 100644 gcc/testsuite/gcc.target/aarch64/pr116815-1.c
   create mode 100644 gcc/testsuite/gcc.target/aarch64/pr116815-2.c
   create mode 100644 gcc/testsuite/gcc.target/aarch64/pr116815-3.c
   create mode 100644 gcc/testsuite/gcc.target/aarch64/pr116815-4.c
   create mode 100644 gcc/testsuite/gcc.target/aarch64/pr116815-5.c
   create mode 100644 gcc/testsuite/gcc.target/aarch64/pr116815-6.c

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 8e431a10cb1..a55fe5c85d8 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -3715,6 +3715,20 @@
     [(set_attr "type" "alus_sreg")]
   )

+;; An equivalent to add<mode>3_compareC
+(define_insn "sub<mode>3_compareC"
+  [(set (reg:CC_C CC_REGNUM)
+     (compare:CC_C
+       (minus:GPI
+         (match_operand:GPI 1 "register_operand" "r")
+         (match_operand:GPI 2 "register_operand" "r"))
+       (match_dup 1)))
+   (set (match_operand:GPI 0 "register_operand" "=r")
+     (minus:GPI (match_dup 1) (match_dup 2)))]
+  ""
+  "subs\t%<w>0, %<w>1, %<w>2"
+)
+


I don't think we want/need this pattern.  subs(a,b) underflows precisely
when a < b, i.e. when a - b < 0 so you can just use a plain CCmode
comparison.  I think you can make use of the existing
sub<mode>3_compare1 pattern here.

This should simplify the define_insn_and_split for the subs case below.

   (define_peephole2
     [(set (match_operand:GPI 0 "aarch64_general_reg")
        (minus:GPI (match_operand:GPI 1 "aarch64_reg_or_zero")
@@ -4455,6 +4469,68 @@
     [(set_attr "type" "<su>div")]
   )

+;; umax (a, add (a, b)) => [sum, ovf] = adds (a, b); !ovf ? sum : a
+;; umin (a, add (a, b)) => [sum, ovf] = adds (a, b); !ovf ? a : sum
+;; ... along with the commutative version of add (a, b) i.e. add (b, a).


For the sake of saving one comment line, it might be better to actually
spell out the commutated variant, AIUI e.g. for umax that's:
umax (b, add (a, b)) => [sum, ovf] = adds (b, a); !ovf ? sum : b


Done. I also added this to the patch description.

+(define_insn_and_split "*aarch64_plus_within_<optab><mode>3_<ovf_commutate>"
+  [(set (match_operand:GPI 0 "register_operand" "=r")
+     (UMAXMIN:GPI
+       (plus:GPI (match_operand:GPI 1 "register_operand" "r")
+                 (match_operand:GPI 2 "register_operand" "r"))
+       (match_dup ovf_commutate)))
+   (clobber (match_scratch:GPI 3))]


I think you might want an "=r" constraint on the match_scratch here.


I am a bit confused here - I expect this pattern to only execute pre-RA
because it requires being able to call gen_reg_rtx (which only works
pre-RA). Does RA populate scratch registers by itself? Otherwise, I
don't really see any other way for this split to run post-RA.


So having done some digging on this, the point of this idiom:

+    if (GET_CODE (operands[3]) == SCRATCH)
+      operands[3] = gen_reg_rtx (<MODE>mode);


is to allow the splitter to work both before and after RA.  Indeed, the
RA will allocate hard registers in place of scratch registers, provided
there is a suitable constraint on the operand.

I think whatever we do the split condition should be "&& 1"; as Uros
pointed out, allowing the insn to match after RA but not allowing the
insn to be split is an ICE trap.

We then need to decide whether to restrict the define_insn_and_splits to
pre-RA only, or allow them both before and after RA.  I don't see a good
reason to forbid them from matching after RA, so I think the patch is OK
with the constraints on the scratch registers added, and the split
conditions changed to "&& 1".  I've added comments to this effect below.

Note that if we _were_ to restrict the pattern to pre-RA only, I agree
with all of Uros's comments, and we would have to define
aarch64_pre_reload_split and use that in the insn condition for the
define_insn_and_splits, but incrementally it seems easier to me to just
allow the pattern to work both before and after RA.


Uros: IIRC I ran into issues where unconditionally calling gen_reg_rtx
still caused an ICE, maybe because the code ran during reload itself.
I guess I could replace "!reload_completed" with "can_create_pseudo_p ()"?

+  "!TARGET_CSSC"
+  "#"
+  "&& !reload_completed"
+  [(parallel
+      [(set (reg:CC_C CC_REGNUM)
+         (compare:CC_C (plus:GPI (match_dup ovf_commutate)
+                                 (match_dup <ovf_comm_opp>))
+                       (match_dup ovf_commutate)))
+       (set (match_dup 3) (plus:GPI (match_dup ovf_commutate)
+                                 (match_dup <ovf_comm_opp>)))])
+   (set (match_dup 0)
+     (if_then_else:GPI (<ovf_add_cmp> (reg:CC_C CC_REGNUM)
+                                      (const_int 0))
+                       (match_dup 3)
+                       (match_dup ovf_commutate)))]
+  {
+    if (GET_CODE (operands[3]) == SCRATCH)
+      operands[3] = gen_reg_rtx (<MODE>mode);


At first I was a bit unsure about this idiom, since md.texi:9358 has
this to say about define_split (and therefore about
define_insn_and_split):

      The @var{preparation-statements} are similar to those statements that
      are specified for @code{define_expand} (@pxref{Expander Definitions})
      and are executed before the new RTL is generated to prepare for the
      generated code or emit some insns whose pattern is not fixed.  Unlike
      those in @code{define_expand}, however, these statements must not
      generate any new pseudo-registers.  Once reload has completed, they also
      must not allocate any space in the stack frame.

but I see that we already use this idiom for a few patterns in aarch64-sve.md.
Would appreciate a second opinion on this.  Either the existing SVE patterns are
wrong or (perhaps more likely) the md.texi docs are out of date.

+  }
+)
+
+;; In this case the ovf represents an underflow.
+;; umax (a, sub (a, b)) => [diff, ovf] = subs (a, b); ovf ? diff : a
+;; umin (a, sub (a, b)) => [diff, ovf] = subs (a, b); ovf ? a : diff
+(define_insn_and_split "*aarch64_minus_within_<optab><mode>3"
+  [(set (match_operand:GPI 0 "register_operand" "=r")
+     (UMAXMIN:GPI
+       (minus:GPI (match_operand:GPI 1 "register_operand" "r")
+                  (match_operand:GPI 2 "register_operand" "r"))
+       (match_dup 1)))
+   (clobber (match_scratch:GPI 3))]
+  "!TARGET_CSSC"
+  "#"
+  "&& !reload_completed"
+  [(parallel
+      [(set (reg:CC_C CC_REGNUM)
+         (compare:CC_C (minus:GPI (match_dup 1) (match_dup 2))
+                       (match_dup 1)))
+       (set (match_dup 3) (minus:GPI (match_dup 1) (match_dup 2)))])
+   ;; There is an inverse mapping here from CC_C -> CC as this requires the
+   ;; inverse of the comparison from the above pattern.


As mentioned above you should be able to do the whole thing in CCmode, which
should simplify things here.

+   (set (match_dup 0)
+         (if_then_else:GPI (<ovf_sub_cmp> (reg:CC CC_REGNUM)
+                                             (const_int 0))
+                           (match_dup 3)
+                           (match_dup 1)))]
+  {
+    if (GET_CODE (operands[3]) == SCRATCH)
+      operands[3] = gen_reg_rtx (<MODE>mode);
+  }
+)
+
   ;; -------------------------------------------------------------------
   ;; Comparison insns
   ;; -------------------------------------------------------------------
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index c3771d9402b..a29243247bd 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -2804,6 +2804,8 @@

   (define_code_iterator FMAXMIN [smax smin])

+(define_code_iterator UMAXMIN [umax umin])
+
   ;; Signed and unsigned max operations.
   (define_code_iterator USMAX [smax umax])

@@ -3092,6 +3094,9 @@

   (define_code_attr maxminand [(smax "bic") (smin "and")])

+(define_code_attr ovf_add_cmp [(umax "geu") (umin "ltu")])
+(define_code_attr ovf_sub_cmp [(umax "ltu") (umin "geu")])
+
   ;; MLA/MLS attributes.
   (define_code_attr as [(ss_plus "a") (ss_minus "s")])

@@ -5007,3 +5012,7 @@
      (UNSPEC_F2CVT "f2cvt")
      (UNSPEC_F1CVTLT "f1cvtlt")
      (UNSPEC_F2CVTLT "f2cvtlt")])
+
+;; Operand numbers for commutative operations
+(define_int_iterator ovf_commutate [1 2])


Note that this iterator already exists with the name BSL_DUP.  It might
be better to rename it to something more general and re-purpose it here,
rather than duplicating it.


Yeah, I saw that as well - using BSL_DUP and bsl_mov would achieve the same
purpose, but I am not sure what commonality between my usage and BSL can be
used for naming? ONE_TWO / two_one are the obvious names, but they don't seem
very good.


Yeah, those are the names that came to mind for me, too, but it's
arguably less readable in context, so IMO it's OK to duplicate the
iterator for the sake of better readability (as the current patch does),
if you prefer.

+(define_int_attr ovf_comm_opp [(1 "2") (2 "1")])
diff --git a/gcc/testsuite/gcc.target/aarch64/pr116815-1.c 
b/gcc/testsuite/gcc.target/aarch64/pr116815-1.c
new file mode 100644
index 00000000000..f3bdb794318
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr116815-1.c
@@ -0,0 +1,120 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" "" } } */
+
+/* PR middle-end/116815 */
+
+/* Single-use tests.  */
+
+static inline unsigned __attribute__ ((always_inline))
+max (unsigned a, unsigned b)
+{
+  return a > b ? a : b;
+}
+
+static inline unsigned __attribute__ ((always_inline))
+min (unsigned a, unsigned b)
+{
+  return a < b ? a : b;
+}
+
+#define OPERATION(op, type, N, exp1, exp2)                                     
\
+  unsigned u##op##type##N (unsigned a, unsigned b) { return op (exp1, exp2); }
+
+/*
+** umaxadd1:
+**   adds    (w[0-9]+), w0, w1
+**   csel    w0, \1, w0, cc
+**   ret
+*/
+OPERATION (max, add, 1, a, a + b)
+
+/*
+** umaxadd2:
+**   adds    (w[0-9]+), w0, w1
+**   csel    w0, \1, w0, cc
+**   ret
+*/
+OPERATION (max, add, 2, a, b + a)
+
+/*
+** umaxadd3:
+**   adds    (w[0-9]+), w0, w1
+**   csel    w0, \1, w0, cc
+**   ret
+*/
+OPERATION (max, add, 3, a + b, a)
+
+/*
+** umaxadd4:
+**   adds    (w[0-9]+), w0, w1
+**   csel    w0, \1, w0, cc
+**   ret
+*/
+OPERATION (max, add, 4, b + a, a)
+
+/*
+** uminadd1:
+**   adds    (w[0-9]+), w0, w1
+**   csel    w0, \1, w0, cs
+**   ret
+*/
+OPERATION (min, add, 1, a, a + b)
+
+/*
+** uminadd2:
+**   adds    (w[0-9]+), w0, w1
+**   csel    w0, \1, w0, cs
+**   ret
+*/
+OPERATION (min, add, 2, a, b + a)
+
+/*
+** uminadd3:
+**   adds    (w[0-9]+), w0, w1
+**   csel    w0, \1, w0, cs
+**   ret
+*/
+OPERATION (min, add, 3, a + b, a)
+
+/*
+** uminadd4:
+**   adds    (w[0-9]+), w0, w1
+**   csel    w0, \1, w0, cs
+**   ret
+*/
+OPERATION (min, add, 4, b + a, a)
+
+/* sub requires the inverse of the comparison from add.  */
+
+/*
+** umaxsub1:
+**   subs    (w[0-9]+), w0, w1
+**   csel    w0, \1, w0, cc
+**   ret
+*/
+OPERATION (max, sub, 1, a, a - b)
+
+/*
+** umaxsub2:
+**   subs    (w[0-9]+), w0, w1
+**   csel    w0, \1, w0, cc
+**   ret
+*/
+OPERATION (max, sub, 2, a - b, a)
+
+/*
+** uminsub1:
+**   subs    (w[0-9]+), w0, w1
+**   csel    w0, \1, w0, cs
+**   ret
+*/
+OPERATION (min, sub, 1, a, a - b)
+
+/*
+** uminsub2:
+**   subs    (w[0-9]+), w0, w1
+**   csel    w0, \1, w0, cs
+**   ret
+*/
+OPERATION (min, sub, 2, a - b, a)
diff --git a/gcc/testsuite/gcc.target/aarch64/pr116815-2.c 
b/gcc/testsuite/gcc.target/aarch64/pr116815-2.c
new file mode 100644
index 00000000000..f2dd7b0bbb3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr116815-2.c
@@ -0,0 +1,93 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+/* PR middle-end/116815 */
+
+/* Negative tests.  */


I don't see the benefit in including negative tests here, the tests will
get optimized out by the middle-end anyway due to C overflow rules.


Sure. I have removed this testcase in the new revision.

+
+static inline int __attribute__ ((always_inline))
+smax (int a, int b)
+{
+  return a > b ? a : b;
+}
+
+static inline int __attribute__ ((always_inline))
+smin (int a, int b)
+{
+  return a < b ? a : b;
+}
+
+static inline unsigned __attribute__ ((always_inline))
+umax (unsigned a, unsigned b)
+{
+  return a > b ? a : b;
+}
+
+static inline unsigned __attribute__ ((always_inline))
+umin (unsigned a, unsigned b)
+{
+  return a < b ? a : b;
+}
+
+#define ASSUME(cond) if (!(cond)) __builtin_unreachable ();
+
+/* This transformation does not trigger on signed types.  */
+
+int
+smax_add (int a, int b)
+{
+  ASSUME (b >= 0);
+  return smax (a, a + b);
+}
+
+int
+smin_add (int a, int b)
+{
+  ASSUME (b >= 0);
+  return smin (a, a + b);
+}
+
+int
+smax_sub (int a, int b)
+{
+  ASSUME (b >= 0);
+  return smax (a, a - b);
+}
+
+int
+smin_sub (int a, int b)
+{
+  ASSUME (b >= 0);
+  return smin (a, a - b);
+}
+
+/* Invalid patterns.  */
+
+/* This can potentially be matched, but the RHS gets factored to
+   (a + b) * b.  */
+unsigned
+umax_factored (unsigned a, unsigned b)
+{
+  return umax (a * b, a * b + b * b);
+}
+
+unsigned
+umin_mult (unsigned a, unsigned b)
+{
+  return umin (a, a * b);
+}
+
+unsigned
+umax_sub (unsigned a, unsigned b)
+{
+  return umax (a, b - a);
+}
+
+unsigned
+umin_sub (unsigned a, unsigned b)
+{
+  return umin (a, b - a);
+}
+
+/* { dg-final { scan-assembler-not "adds\\t" } } */
+/* { dg-final { scan-assembler-not "subs\\t" } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/pr116815-3.c 
b/gcc/testsuite/gcc.target/aarch64/pr116815-3.c
new file mode 100644
index 00000000000..f8050377116
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr116815-3.c
@@ -0,0 +1,62 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" "" } } */
+
+/* PR middle-end/116815 */
+
+/* Multi-use tests.  */
+
+static inline unsigned __attribute__ ((always_inline))
+max (unsigned a, unsigned b)
+{
+  return a > b ? a : b;
+}
+
+static inline unsigned __attribute__ ((always_inline))
+min (unsigned a, unsigned b)
+{
+  return a < b ? a : b;
+}
+
+/* FIXME: This should only generate one adds.  */


I appreciate you raising the shortcomings of the patch, but perhaps better just
to mention it in the cover letter rather than adding a test that explicitly
defends the suboptimal codegen.  AIUI, check-function-bodies tests are intended
for when we're confident there is one optimal sequence of codegen that we want
to generate.


I have added the FIXME as a comment in the commit message, does that seem okay?


Yep, that's fine.

This testcase is actually a leftover from when the patch was a GIMPLE transform,
because this used to work before and is a regression after switching to a
target-specific transform. I have removed this testcase now.


I see, thanks.

<snip>

 From 588dca91b719168f8ff98b7e8ee4b395bccc7cda Mon Sep 17 00:00:00 2001
From: Dhruv Chawla <[email protected]>
Date: Wed, 23 Jul 2025 01:41:51 -0700
Subject: [PATCH] [aarch64] Make better use of overflowing operations in
  max/min(a, add/sub(a, b)) [PR116815]

This patch folds the following patterns:
- For add:
   - umax (a, add (a, b)) -> [sum,  ovf] = adds (a, b); !ovf ? sum : a
   - umin (a, add (a, b)) -> [sum,  ovf] = adds (a, b); !ovf ? a : sum
     ... along with the commutated versions:
   - umax (a, add (b, a)) -> [sum,  ovf] = adds (b, a); !ovf ? sum : a
   - umin (a, add (b, a)) -> [sum,  ovf] = adds (b, a); !ovf ? a : sum
- For sub:
   - umax (a, sub (a, b)) -> [diff, udf] = subs (a, b); udf ? diff : a
   - umin (a, sub (a, b)) -> [diff, udf] = subs (a, b); udf ? a : diff

Where ovf is the overflow flag and udf is the underflow flag. adds and subs are
generated by generating parallel compare+plus/minus which map to
add<mode>3_compareC and sub<mode>3_compare1.

This patch is a respin of the patch posted at
https://gcc.gnu.org/pipermail/gcc-patches/2025-May/685021.html as per
the suggestion to turn it into a target-specific transform by Richard
Biener.

FIXME: This pattern cannot currently factor multiple occurences of the
        add expression into a single adds, eg: max (a, a + b) + min (a + b, b)
        ends up generating two adds instructions. This is something that
        was lost when going from GIMPLE to target-specific transforms.

Bootstrapped and regtested on aarch64-unknown-linux-gnu.

Signed-off-by: Dhruv Chawla <[email protected]>

       PR middle-end/116815

gcc/ChangeLog:

       * config/aarch64/aarch64.md
       (*aarch64_plus_within_<optab><mode>3_<ovf_commutate>): New pattern.
       (*aarch64_minus_within_<optab><mode>3): Likewise.
       * config/aarch64/iterators.md (ovf_add_cmp): New code attribute.
       (udf_sub_cmp): Likewise.
       (UMAXMIN): New code iterator.
       (ovf_commutate): New iterator.
       (ovf_comm_opp): New int attribute.

gcc/testsuite/ChangeLog:

       * gcc.target/aarch64/pr116815-1.c: New test.
       * gcc.target/aarch64/pr116815-2.c: Likewise.
       * gcc.target/aarch64/pr116815-3.c: Likewise.
       * gcc.target/aarch64/pr116815-4.c: Likewise.
---
  gcc/config/aarch64/aarch64.md                 |  60 +++++++++
  gcc/config/aarch64/iterators.md               |   9 ++
  gcc/testsuite/gcc.target/aarch64/pr116815-1.c | 120 ++++++++++++++++++
  gcc/testsuite/gcc.target/aarch64/pr116815-2.c |  48 +++++++
  gcc/testsuite/gcc.target/aarch64/pr116815-3.c |  44 +++++++
  gcc/testsuite/gcc.target/aarch64/pr116815-4.c |  60 +++++++++
  6 files changed, 341 insertions(+)
  create mode 100644 gcc/testsuite/gcc.target/aarch64/pr116815-1.c
  create mode 100644 gcc/testsuite/gcc.target/aarch64/pr116815-2.c
  create mode 100644 gcc/testsuite/gcc.target/aarch64/pr116815-3.c
  create mode 100644 gcc/testsuite/gcc.target/aarch64/pr116815-4.c

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 98c65a74c8e..307df4e92ba 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -4488,6 +4488,66 @@
    [(set_attr "type" "<su>div")]
  )
+;; umax (a, add (a, b)) => [sum, ovf] = adds (a, b); !ovf ? sum : a
+;; umin (a, add (a, b)) => [sum, ovf] = adds (a, b); !ovf ? a : sum
+;; ... and the commutated versions:
+;; umax (a, add (b, a)) => [sum, ovf] = adds (b, a); !ovf ? sum : a
+;; umin (a, add (b, a)) => [sum, ovf] = adds (b, a); !ovf ? a : sum
+(define_insn_and_split "*aarch64_plus_within_<optab><mode>3_<ovf_commutate>"
+  [(set (match_operand:GPI 0 "register_operand" "=r")
+     (UMAXMIN:GPI
+       (plus:GPI (match_operand:GPI 1 "register_operand" "r")
+                 (match_operand:GPI 2 "register_operand" "r"))
+       (match_dup ovf_commutate)))
+   (clobber (match_scratch:GPI 3))]


Please add an "=r" constraint here.

+  "!TARGET_CSSC"
+  "#"
+  "&& !reload_completed"


This should be "&& 1".

+  [(parallel
+      [(set (reg:CC_C CC_REGNUM)
+         (compare:CC_C (plus:GPI (match_dup ovf_commutate)
+                                 (match_dup <ovf_comm_opp>))
+                       (match_dup ovf_commutate)))
+       (set (match_dup 3) (plus:GPI (match_dup ovf_commutate)
+                                 (match_dup <ovf_comm_opp>)))])
+   (set (match_dup 0)
+     (if_then_else:GPI (<ovf_add_cmp> (reg:CC_C CC_REGNUM)
+                                      (const_int 0))
+                       (match_dup 3)
+                       (match_dup ovf_commutate)))]
+  {
+    if (GET_CODE (operands[3]) == SCRATCH)
+      operands[3] = gen_reg_rtx (<MODE>mode);
+  }
+)
+
+;; umax (a, sub (a, b)) => [diff, udf] = subs (a, b); udf ? diff : a
+;; umin (a, sub (a, b)) => [diff, udf] = subs (a, b); udf ? a : diff
+(define_insn_and_split "*aarch64_minus_within_<optab><mode>3"
+  [(set (match_operand:GPI 0 "register_operand" "=r")
+     (UMAXMIN:GPI
+       (minus:GPI (match_operand:GPI 1 "register_operand" "r")
+                  (match_operand:GPI 2 "register_operand" "r"))
+       (match_dup 1)))
+   (clobber (match_scratch:GPI 3))]


Please add an "=r" constraint here.

+  "!TARGET_CSSC"
+  "#"
+  "&& !reload_completed"


This should be "&& 1".

+  [(parallel
+      [(set (reg:CC CC_REGNUM)
+         (compare:CC (match_dup 1) (match_dup 2)))
+       (set (match_dup 3) (minus:GPI (match_dup 1) (match_dup 2)))])
+   (set (match_dup 0)
+         (if_then_else:GPI (<udf_sub_cmp> (reg:CC CC_REGNUM)
+                                             (const_int 0))
+                           (match_dup 3)
+                           (match_dup 1)))]
+  {
+    if (GET_CODE (operands[3]) == SCRATCH)
+      operands[3] = gen_reg_rtx (<MODE>mode);
+  }
+)
+
  ;; -------------------------------------------------------------------
  ;; Comparison insns
  ;; -------------------------------------------------------------------
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index 517b2808b5f..74186cfae8c 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -2827,6 +2827,8 @@
  (define_code_iterator FMAXMIN [smax smin])
+(define_code_iterator UMAXMIN [umax umin])
+
  ;; Signed and unsigned max operations.
  (define_code_iterator USMAX [smax umax])
@@ -3115,6 +3117,9 @@
  (define_code_attr maxminand [(smax "bic") (smin "and")])
+(define_code_attr ovf_add_cmp [(umax "geu") (umin "ltu")])
+(define_code_attr udf_sub_cmp [(umax "ltu") (umin "geu")])
+
  ;; MLA/MLS attributes.
  (define_code_attr as [(ss_plus "a") (ss_minus "s")])
@@ -5126,3 +5131,7 @@
     (UNSPEC_F2CVT "f2cvt")
     (UNSPEC_F1CVTLT "f1cvtlt")
     (UNSPEC_F2CVTLT "f2cvtlt")])
+
+;; Operand numbers for commutative operations
+(define_int_iterator ovf_commutate [1 2])
+(define_int_attr ovf_comm_opp [(1 "2") (2 "1")])
diff --git a/gcc/testsuite/gcc.target/aarch64/pr116815-1.c 
b/gcc/testsuite/gcc.target/aarch64/pr116815-1.c
new file mode 100644
index 00000000000..f3bdb794318
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr116815-1.c
@@ -0,0 +1,120 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" "" } } */
+
+/* PR middle-end/116815 */
+
+/* Single-use tests.  */
+
+static inline unsigned __attribute__ ((always_inline))
+max (unsigned a, unsigned b)
+{
+  return a > b ? a : b;
+}
+
+static inline unsigned __attribute__ ((always_inline))
+min (unsigned a, unsigned b)
+{
+  return a < b ? a : b;
+}
+
+#define OPERATION(op, type, N, exp1, exp2)                                     
\
+  unsigned u##op##type##N (unsigned a, unsigned b) { return op (exp1, exp2); }
+
+/*
+** umaxadd1:
+**   adds    (w[0-9]+), w0, w1
+**   csel    w0, \1, w0, cc
+**   ret
+*/
+OPERATION (max, add, 1, a, a + b)
+
+/*
+** umaxadd2:
+**   adds    (w[0-9]+), w0, w1
+**   csel    w0, \1, w0, cc
+**   ret
+*/
+OPERATION (max, add, 2, a, b + a)
+
+/*
+** umaxadd3:
+**   adds    (w[0-9]+), w0, w1
+**   csel    w0, \1, w0, cc
+**   ret
+*/
+OPERATION (max, add, 3, a + b, a)
+
+/*
+** umaxadd4:
+**   adds    (w[0-9]+), w0, w1
+**   csel    w0, \1, w0, cc
+**   ret
+*/
+OPERATION (max, add, 4, b + a, a)
+
+/*
+** uminadd1:
+**   adds    (w[0-9]+), w0, w1
+**   csel    w0, \1, w0, cs
+**   ret
+*/
+OPERATION (min, add, 1, a, a + b)
+
+/*
+** uminadd2:
+**   adds    (w[0-9]+), w0, w1
+**   csel    w0, \1, w0, cs
+**   ret
+*/
+OPERATION (min, add, 2, a, b + a)
+
+/*
+** uminadd3:
+**   adds    (w[0-9]+), w0, w1
+**   csel    w0, \1, w0, cs
+**   ret
+*/
+OPERATION (min, add, 3, a + b, a)
+
+/*
+** uminadd4:
+**   adds    (w[0-9]+), w0, w1
+**   csel    w0, \1, w0, cs
+**   ret
+*/
+OPERATION (min, add, 4, b + a, a)
+
+/* sub requires the inverse of the comparison from add.  */
+
+/*
+** umaxsub1:
+**   subs    (w[0-9]+), w0, w1
+**   csel    w0, \1, w0, cc
+**   ret
+*/
+OPERATION (max, sub, 1, a, a - b)
+
+/*
+** umaxsub2:
+**   subs    (w[0-9]+), w0, w1
+**   csel    w0, \1, w0, cc
+**   ret
+*/
+OPERATION (max, sub, 2, a - b, a)
+
+/*
+** uminsub1:
+**   subs    (w[0-9]+), w0, w1
+**   csel    w0, \1, w0, cs
+**   ret
+*/
+OPERATION (min, sub, 1, a, a - b)
+
+/*
+** uminsub2:
+**   subs    (w[0-9]+), w0, w1
+**   csel    w0, \1, w0, cs
+**   ret
+*/
+OPERATION (min, sub, 2, a - b, a)
diff --git a/gcc/testsuite/gcc.target/aarch64/pr116815-2.c 
b/gcc/testsuite/gcc.target/aarch64/pr116815-2.c
new file mode 100644
index 00000000000..eb1fdb94797
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr116815-2.c
@@ -0,0 +1,48 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+/* PR middle-end/116815 */
+
+/* Multi-use testcase across basic blocks.  */
+
+static inline unsigned __attribute__ ((always_inline))
+max (unsigned a, unsigned b)
+{
+  return a > b ? a : b;
+}
+
+static inline unsigned __attribute__ ((always_inline))
+min (unsigned a, unsigned b)
+{
+  return a < b ? a : b;
+}
+
+#define OPERATION(op, type, N, exp1, exp2)                                     
\
+  unsigned u##op##type##N (unsigned a, unsigned b, unsigned c, unsigned d,     
\
+                        unsigned e)                                         \
+  {                                                                            
\
+    unsigned result = op (exp1, exp2);                                         
\
+    if (result == c || result == c * 2)                                        
\
+      return d;                                                                
\
+    else                                                                       
\
+      return e;                                                                
\
+  }


What is the purpose/intent of this test?  IMO it seems superfluous.

The patch is OK without it and with the above changes to the patterns.


Hi Alex,

Thanks for the review! I have applied the changes and committed as
3c378398111f7fc3c026b705e3ac088b27d4c307.

--
Regards,
Dhruv


Thanks!
Alex

+
+OPERATION (max, add, 1, a, a + b)
+OPERATION (max, add, 2, a, b + a)
+OPERATION (max, add, 3, a + b, a)
+OPERATION (max, add, 4, b + a, a)
+
+OPERATION (min, add, 1, a, a + b)
+OPERATION (min, add, 2, a, b + a)
+OPERATION (min, add, 3, a + b, a)
+OPERATION (min, add, 4, b + a, a)
+
+OPERATION (max, sub, 1, a, a - b)
+OPERATION (max, sub, 2, a - b, a)
+
+OPERATION (min, sub, 1, a, a - b)
+OPERATION (min, sub, 2, a - b, a)
+
+/* { dg-final { scan-assembler-times "adds\\t" 8 } } */
+/* { dg-final { scan-assembler-times "subs\\t" 4 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/pr116815-3.c 
b/gcc/testsuite/gcc.target/aarch64/pr116815-3.c
new file mode 100644
index 00000000000..015c868aec2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr116815-3.c
@@ -0,0 +1,44 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+#pragma GCC target "+cssc"
+
+/* PR middle-end/116815 */
+
+/* Make sure that umax/umin instructions are generated with CSSC.  */
+
+static inline unsigned __attribute__ ((always_inline))
+max (unsigned a, unsigned b)
+{
+  return a > b ? a : b;
+}
+
+static inline unsigned __attribute__ ((always_inline))
+min (unsigned a, unsigned b)
+{
+  return a < b ? a : b;
+}
+
+#define OPERATION(op, type, N, exp1, exp2)                                     
\
+  unsigned u##op##type##N (unsigned a, unsigned b) { return op (exp1, exp2); }
+
+OPERATION (max, add, 1, a, a + b)
+OPERATION (max, add, 2, a, b + a)
+OPERATION (max, add, 3, a + b, a)
+OPERATION (max, add, 4, b + a, a)
+
+OPERATION (min, add, 1, a, a + b)
+OPERATION (min, add, 2, a, b + a)
+OPERATION (min, add, 3, a + b, a)
+OPERATION (min, add, 4, b + a, a)
+
+OPERATION (max, sub, 1, a, a - b)
+OPERATION (max, sub, 2, a - b, a)
+
+OPERATION (min, sub, 1, a, a - b)
+OPERATION (min, sub, 2, a - b, a)
+
+/* { dg-final { scan-assembler-times "umax\\t" 6 } } */
+/* { dg-final { scan-assembler-times "umin\\t" 6 } } */
+/* { dg-final { scan-assembler-not "adds\\t" } } */
+/* { dg-final { scan-assembler-not "subs\\t" } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/pr116815-4.c 
b/gcc/testsuite/gcc.target/aarch64/pr116815-4.c
new file mode 100644
index 00000000000..d262d2170f3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr116815-4.c
@@ -0,0 +1,60 @@
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+
+/* PR middle-end/116815 */
+
+/* Verify that the transformation gives correct results */
+
+static inline unsigned __attribute__ ((always_inline))
+min (unsigned a, unsigned b)
+{
+  return (a < b) ? a : b;
+}
+
+static inline unsigned __attribute__ ((always_inline))
+max (unsigned a, unsigned b)
+{
+  return (a > b) ? a : b;
+}
+
+__attribute__ ((noipa)) unsigned
+umaxadd (unsigned a, unsigned b)
+{
+  return max (a + b, a);
+}
+
+__attribute__ ((noipa)) unsigned
+umaxsub (unsigned a, unsigned b)
+{
+  return max (a - b, a);
+}
+
+__attribute__ ((noipa)) unsigned
+uminadd (unsigned a, unsigned b)
+{
+  return min (a + b, a);
+}
+
+__attribute__ ((noipa)) unsigned
+uminsub (unsigned a, unsigned b)
+{
+  return min (a - b, a);
+}
+
+int
+main ()
+{
+  /* Overflows to 0x30000000.  */
+  if (umaxadd (0x90000000, 0xa0000000) != 0x90000000)
+    __builtin_abort ();
+
+  if (uminadd (0x90000000, 0xa0000000) != 0x30000000)
+    __builtin_abort ();
+
+  /* Underflows to 0x60000000.  */
+  if (umaxsub (0x00000000, 0xa0000000) != 0x60000000)
+    __builtin_abort ();
+
+  if (uminsub (0x00000000, 0xa0000000) != 0x00000000)
+    __builtin_abort ();
+}
--
2.44.0

Re: [PATCH v5] [aarch64] Make better use of overflowing operations in max/min(a, add/sub(a, b)) [PR116815]

Reply via email to