On 01/08/25 22:10, Richard Sandiford wrote:
External email: Use caution opening links or attachments
Dhruv Chawla <dhr...@nvidia.com> writes:
On 24/07/25 11:21, Andrew Pinski wrote:
External email: Use caution opening links or attachments
On Wed, Jul 23, 2025 at 10:16 PM <dhr...@nvidia.com> wrote:
From: Dhruv Chawla <dhr...@nvidia.com>
This patch folds the following patterns:
- max (a, add (a, b)) -> [sum, ovf] = adds (a, b); !ovf ? sum : a
- min (a, add (a, b)) -> [sum, ovf] = adds (a, b); !ovf ? a : sum
- max (a, sub (a, b)) -> [sum, ovf] = subs (a, b); !ovf ? a : sum
- min (a, sub (a, b)) -> [sum, ovf] = subs (a, b); !ovf ? sum : a
Where ovf is the overflow flag. adds and subs are generated by
generating a parallel compare+plus/minus which maps to the pattern
add<mode>3_compareC. sub<mode>3_compareC is also created to have an
equivalent pattern for the subs instruction.
This patch is a respin of the patch posted at
https://gcc.gnu.org/pipermail/gcc-patches/2025-May/685021.html as per
the suggestion to turn it into a target-specific transform by Richard
Biener.
Bootstrapped and regtested on aarch64-unknown-linux-gnu.
Signed-off-by: Dhruv Chawla <dhr...@nvidia.com>
PR middle-end/116815
gcc/ChangeLog:
* config/aarch64/aarch64.md (sub<mode>3_compareC): New pattern.
(*aarch64_plus_within_<optab><mode>3_<ovf_commutate>): Likewise.
(*aarch64_minus_within_<optab><mode>3): Likewise.
* config/aarch64/iterators.md (ovf_add_cmp): New code attribute.
(ovf_sub_cmp): Likewise.
(ovf_commutate): New iterator.
(ovf_comm_opp): New int attribute.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/pr116815-1.c: New test.
* gcc.target/aarch64/pr116815-2.c: Likewise.
* gcc.target/aarch64/pr116815-3.c: Likewise.
* gcc.target/aarch64/pr116815-4.c: Likewise.
* gcc.target/aarch64/pr116815-5.c: Likewise.
---
gcc/config/aarch64/aarch64.md | 73 +++++++++++
gcc/config/aarch64/iterators.md | 9 ++
gcc/testsuite/gcc.target/aarch64/pr116815-1.c | 119 ++++++++++++++++++
gcc/testsuite/gcc.target/aarch64/pr116815-2.c | 94 ++++++++++++++
gcc/testsuite/gcc.target/aarch64/pr116815-3.c | 63 ++++++++++
gcc/testsuite/gcc.target/aarch64/pr116815-4.c | 49 ++++++++
gcc/testsuite/gcc.target/aarch64/pr116815-5.c | 45 +++++++
7 files changed, 452 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/aarch64/pr116815-1.c
create mode 100644 gcc/testsuite/gcc.target/aarch64/pr116815-2.c
create mode 100644 gcc/testsuite/gcc.target/aarch64/pr116815-3.c
create mode 100644 gcc/testsuite/gcc.target/aarch64/pr116815-4.c
create mode 100644 gcc/testsuite/gcc.target/aarch64/pr116815-5.c
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index a4ae6859da0..c9f88c40473 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -3741,6 +3741,20 @@
[(set_attr "type" "alus_sreg")]
)
+;; An equivalent to add<mode>3_compareC
+(define_insn "sub<mode>3_compareC"
+ [(set (reg:CC_C CC_REGNUM)
+ (compare:CC_C
+ (minus:GPI
+ (match_operand:GPI 1 "register_operand" "r")
+ (match_operand:GPI 2 "register_operand" "r"))
+ (match_dup 1)))
+ (set (match_operand:GPI 0 "register_operand" "=r")
+ (minus:GPI (match_dup 1) (match_dup 2)))]
+ ""
+ "subs\t%<w>0, %<w>1, %<w>2"
+)
+
I think I'm going to campaign for renaming CC_C to CC_C_INV. I spent a
while trying to understand how this pattern could be correct, before
realising that CC_C uses the opposite mapping from CC. The mapping is:
CC: RTL GEU == AArch64 HS == AArch64 CS (good)
CC_C: RTL GEU == AArch64 LO == AArch64 CC
I think the later patterns get caught by this.
Hi,
Wow, this is really confusing. It looks like I missed/misunderstood the
comment in aarch64-modes.def that explains this, my bad. Do you think
exchanging the operand order (for 1 and 2) would be an okay fix for
this (albeit not any less confusing)? Or should I switch this around to
using CC directly?
I also don't really understand why CC_C is used instead of CC for patterns
in general? Is it for more accurate modelling of the NZCV register for
scheduling or regalloc purposes? Or (from what seems more likely from
aarch64-modes.def), is it to map overflowing addition and subtraction to
condition flags in the same way?
(define_peephole2
[(set (match_operand:GPI 0 "aarch64_general_reg")
(minus:GPI (match_operand:GPI 1 "aarch64_reg_or_zero")
@@ -4481,6 +4495,65 @@
[(set_attr "type" "<su>div")]
)
+;; umax (a, add (a, b)) => [sum, ovf] = adds (a, b); !ovf ? sum : a
+;; umin (a, add (a, b)) => [sum, ovf] = adds (a, b); !ovf ? a : sum
+;; ... along with the commutative version of add (a, b) i.e. add (b, a)
+(define_insn_and_split "*aarch64_plus_within_<optab><mode>3_<ovf_commutate>"
+ [(set (match_operand:GPI 0 "register_operand")
+ (UMAXMIN:GPI
+ (plus:GPI (match_operand:GPI 1 "register_operand")
+ (match_operand:GPI 2 "register_operand"))
+ (match_dup <ovf_commutate>)))
+ (clobber (match_scratch:GPI 3))]
+ "!TARGET_CSSC"
+ "#"
+ "&& !reload_completed"
Since this is a define_insn_and_split, I think there should be
constraints and not just predicates on it. I don't think you can
depend on it being split before RA (though I could be wrong) or
matching post RA.
Agreed.
+ [(parallel
+ [(set (reg:CC_C CC_REGNUM)
+ (compare:CC_C (plus:GPI (match_dup ovf_commutate)
I am thinking you are missing "<>" around ovf_commutate here and other
places below. Or did I misunderstand how this works?
Hi,
Yeah, this is a bit weird. It looks like the split part of a pattern does
not require the <>. Infact, I was getting undefined iterator errors when
I put them. Without them, the md gets correctly generated (I verified using
"make mddump" in build/gcc/).
Yeah, since it's an iterator, using it without "<" and ">" is correct.
I think it's the (match_dup <ovf_commutate>) on the define_insn part
that's wrong -- it should be (match_dup ovf_commutate) too.
I'll look into why the parsers accept it.
What is the difference between surrounding an iterator with <> (such as in the
standard pattern name) versus its use in the pattern without <>?
+ (match_dup <ovf_comm_opp>))
+ (match_dup ovf_commutate)))
+ (set (match_dup 3) (plus:GPI (match_dup ovf_commutate)
+ (match_dup <ovf_comm_opp>)))])
+ (set (match_dup 0)
+ (if_then_else:GPI (<ovf_add_cmp> (reg:CC CC_REGNUM)
+ (const_int 0))
+ (match_dup 3)
+ (match_dup ovf_commutate)))]
This sets the carry flag in CC_Cmode but tests it in CCmode, with:
(define_code_attr ovf_add_cmp [(umax "geu") (umin "ltu")])
So UMAX uses GEU, which for CC maps to the natural HS/CS, i.e. to
ovf rather than !ovf. So for UMAX, this chooses the sum on overflow
and the input otherwise. Therefore:
+/*
+** umaxadd1:
+** adds (w[0-9]+), w0, w1
+** csel w0, \1, w0, cs
+** ret
+*/
+OPERATION (max, add, 1, a, a + b)
...I don't think this is correct. It chooses the addition result if
the ADD overflows and chooses a otherwise. For example, after the patch:
[[gnu::noipa]] unsigned
foo (unsigned a, unsigned b)
{
return a + b > b ? a + b : b;
}
int
main ()
{
__builtin_printf ("%x\n", foo (0x90000000, 0xa0000000));
}
prints 30000000 when compiled at -O2 but a0000000 when compiled at -O0.
Yeah, that makes sense. I thought that CC_C mapped directly to CC, so it
makes sense that the pattern breaks. Thanks for catching this! I think I
should add runtime tests as well to make sure that the behaviour is
modeled correctly.
Thanks,
Richard
--
Regards,
Dhruv