Dhruv Chawla <dhr...@nvidia.com> writes: > On 24/07/25 11:21, Andrew Pinski wrote: >> External email: Use caution opening links or attachments >> >> >> On Wed, Jul 23, 2025 at 10:16 PM <dhr...@nvidia.com> wrote: >>> >>> From: Dhruv Chawla <dhr...@nvidia.com> >>> >>> This patch folds the following patterns: >>> - max (a, add (a, b)) -> [sum, ovf] = adds (a, b); !ovf ? sum : a >>> - min (a, add (a, b)) -> [sum, ovf] = adds (a, b); !ovf ? a : sum >>> - max (a, sub (a, b)) -> [sum, ovf] = subs (a, b); !ovf ? a : sum >>> - min (a, sub (a, b)) -> [sum, ovf] = subs (a, b); !ovf ? sum : a >>> >>> Where ovf is the overflow flag. adds and subs are generated by >>> generating a parallel compare+plus/minus which maps to the pattern >>> add<mode>3_compareC. sub<mode>3_compareC is also created to have an >>> equivalent pattern for the subs instruction. >>> >>> This patch is a respin of the patch posted at >>> https://gcc.gnu.org/pipermail/gcc-patches/2025-May/685021.html as per >>> the suggestion to turn it into a target-specific transform by Richard >>> Biener. >>> >>> Bootstrapped and regtested on aarch64-unknown-linux-gnu. >>> >>> Signed-off-by: Dhruv Chawla <dhr...@nvidia.com> >>> >>> PR middle-end/116815 >>> >>> gcc/ChangeLog: >>> >>> * config/aarch64/aarch64.md (sub<mode>3_compareC): New pattern. >>> (*aarch64_plus_within_<optab><mode>3_<ovf_commutate>): Likewise. >>> (*aarch64_minus_within_<optab><mode>3): Likewise. >>> * config/aarch64/iterators.md (ovf_add_cmp): New code attribute. >>> (ovf_sub_cmp): Likewise. >>> (ovf_commutate): New iterator. >>> (ovf_comm_opp): New int attribute. >>> >>> gcc/testsuite/ChangeLog: >>> >>> * gcc.target/aarch64/pr116815-1.c: New test. >>> * gcc.target/aarch64/pr116815-2.c: Likewise. >>> * gcc.target/aarch64/pr116815-3.c: Likewise. >>> * gcc.target/aarch64/pr116815-4.c: Likewise. >>> * gcc.target/aarch64/pr116815-5.c: Likewise. >>> --- >>> gcc/config/aarch64/aarch64.md | 73 +++++++++++ >>> gcc/config/aarch64/iterators.md | 9 ++ >>> gcc/testsuite/gcc.target/aarch64/pr116815-1.c | 119 ++++++++++++++++++ >>> gcc/testsuite/gcc.target/aarch64/pr116815-2.c | 94 ++++++++++++++ >>> gcc/testsuite/gcc.target/aarch64/pr116815-3.c | 63 ++++++++++ >>> gcc/testsuite/gcc.target/aarch64/pr116815-4.c | 49 ++++++++ >>> gcc/testsuite/gcc.target/aarch64/pr116815-5.c | 45 +++++++ >>> 7 files changed, 452 insertions(+) >>> create mode 100644 gcc/testsuite/gcc.target/aarch64/pr116815-1.c >>> create mode 100644 gcc/testsuite/gcc.target/aarch64/pr116815-2.c >>> create mode 100644 gcc/testsuite/gcc.target/aarch64/pr116815-3.c >>> create mode 100644 gcc/testsuite/gcc.target/aarch64/pr116815-4.c >>> create mode 100644 gcc/testsuite/gcc.target/aarch64/pr116815-5.c >>> >>> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md >>> index a4ae6859da0..c9f88c40473 100644 >>> --- a/gcc/config/aarch64/aarch64.md >>> +++ b/gcc/config/aarch64/aarch64.md >>> @@ -3741,6 +3741,20 @@ >>> [(set_attr "type" "alus_sreg")] >>> ) >>> >>> +;; An equivalent to add<mode>3_compareC >>> +(define_insn "sub<mode>3_compareC" >>> + [(set (reg:CC_C CC_REGNUM) >>> + (compare:CC_C >>> + (minus:GPI >>> + (match_operand:GPI 1 "register_operand" "r") >>> + (match_operand:GPI 2 "register_operand" "r")) >>> + (match_dup 1))) >>> + (set (match_operand:GPI 0 "register_operand" "=r") >>> + (minus:GPI (match_dup 1) (match_dup 2)))] >>> + "" >>> + "subs\t%<w>0, %<w>1, %<w>2" >>> +) >>> +
I think I'm going to campaign for renaming CC_C to CC_C_INV. I spent a while trying to understand how this pattern could be correct, before realising that CC_C uses the opposite mapping from CC. The mapping is: CC: RTL GEU == AArch64 HS == AArch64 CS (good) CC_C: RTL GEU == AArch64 LO == AArch64 CC I think the later patterns get caught by this. >>> (define_peephole2 >>> [(set (match_operand:GPI 0 "aarch64_general_reg") >>> (minus:GPI (match_operand:GPI 1 "aarch64_reg_or_zero") >>> @@ -4481,6 +4495,65 @@ >>> [(set_attr "type" "<su>div")] >>> ) >>> >>> +;; umax (a, add (a, b)) => [sum, ovf] = adds (a, b); !ovf ? sum : a >>> +;; umin (a, add (a, b)) => [sum, ovf] = adds (a, b); !ovf ? a : sum >>> +;; ... along with the commutative version of add (a, b) i.e. add (b, a) >>> +(define_insn_and_split >>> "*aarch64_plus_within_<optab><mode>3_<ovf_commutate>" >>> + [(set (match_operand:GPI 0 "register_operand") >>> + (UMAXMIN:GPI >>> + (plus:GPI (match_operand:GPI 1 "register_operand") >>> + (match_operand:GPI 2 "register_operand")) >>> + (match_dup <ovf_commutate>))) >>> + (clobber (match_scratch:GPI 3))] >>> + "!TARGET_CSSC" >>> + "#" >>> + "&& !reload_completed" >> >> Since this is a define_insn_and_split, I think there should be >> constraints and not just predicates on it. I don't think you can >> depend on it being split before RA (though I could be wrong) or >> matching post RA. Agreed. >>> + [(parallel >>> + [(set (reg:CC_C CC_REGNUM) >>> + (compare:CC_C (plus:GPI (match_dup ovf_commutate) >> >> I am thinking you are missing "<>" around ovf_commutate here and other >> places below. Or did I misunderstand how this works? > > Hi, > > Yeah, this is a bit weird. It looks like the split part of a pattern does > not require the <>. Infact, I was getting undefined iterator errors when > I put them. Without them, the md gets correctly generated (I verified using > "make mddump" in build/gcc/). Yeah, since it's an iterator, using it without "<" and ">" is correct. I think it's the (match_dup <ovf_commutate>) on the define_insn part that's wrong -- it should be (match_dup ovf_commutate) too. I'll look into why the parsers accept it. >>> + (match_dup <ovf_comm_opp>)) >>> + (match_dup ovf_commutate))) >>> + (set (match_dup 3) (plus:GPI (match_dup ovf_commutate) >>> + (match_dup <ovf_comm_opp>)))]) >>> + (set (match_dup 0) >>> + (if_then_else:GPI (<ovf_add_cmp> (reg:CC CC_REGNUM) >>> + (const_int 0)) >>> + (match_dup 3) >>> + (match_dup ovf_commutate)))] This sets the carry flag in CC_Cmode but tests it in CCmode, with: (define_code_attr ovf_add_cmp [(umax "geu") (umin "ltu")]) So UMAX uses GEU, which for CC maps to the natural HS/CS, i.e. to ovf rather than !ovf. So for UMAX, this chooses the sum on overflow and the input otherwise. Therefore: > +/* > +** umaxadd1: > +** adds (w[0-9]+), w0, w1 > +** csel w0, \1, w0, cs > +** ret > +*/ > +OPERATION (max, add, 1, a, a + b) ...I don't think this is correct. It chooses the addition result if the ADD overflows and chooses a otherwise. For example, after the patch: [[gnu::noipa]] unsigned foo (unsigned a, unsigned b) { return a + b > b ? a + b : b; } int main () { __builtin_printf ("%x\n", foo (0x90000000, 0xa0000000)); } prints 30000000 when compiled at -O2 but a0000000 when compiled at -O0. Thanks, Richard