On Wed, Nov 5, 2025 at 6:09 AM Dhruv Chawla <[email protected]> wrote: > > On 28/10/25 22:03, Alex Coplan wrote: > > External email: Use caution opening links or attachments > > > > > > Hi Dhruv, > > > > Sorry for the long wait on this one. Comments below ... > > Hi Alex, > > Thanks for the review. I have attached a new version of the patch > to this email. > > > > > On 18/08/2025 21:31, [email protected] wrote: > >> From: Dhruv Chawla <[email protected]> > >> > >> This patch folds the following patterns: > >> - umax (a, add (a, b)) -> [sum, ovf] = adds (a, b); !ovf ? sum : a > >> - umin (a, add (a, b)) -> [sum, ovf] = adds (a, b); !ovf ? a : sum > >> - umax (a, sub (a, b)) -> [diff, ovf] = subs (a, b); ovf ? diff : a > >> - umin (a, sub (a, b)) -> [diff, ovf] = subs (a, b); ovf ? a : diff > >> > >> Where ovf is the overflow flag. adds and subs are generated by > >> generating a parallel compare+plus/minus which maps to the pattern > >> add<mode>3_compareC. sub<mode>3_compareC is also created to have an > >> equivalent pattern for the subs instruction. In the case of subs, the > >> overflow flag represents underflow. > >> > >> This patch is a respin of the patch posted at > >> https://gcc.gnu.org/pipermail/gcc-patches/2025-May/685021.html as per > >> the suggestion to turn it into a target-specific transform by Richard > >> Biener. > >> > >> Bootstrapped and regtested on aarch64-unknown-linux-gnu. > >> > >> Signed-off-by: Dhruv Chawla <[email protected]> > >> > >> PR middle-end/116815 > >> > >> gcc/ChangeLog: > >> > >> * config/aarch64/aarch64.md (sub<mode>3_compareC): New pattern. > >> (*aarch64_plus_within_<optab><mode>3_<ovf_commutate>): Likewise. > >> (*aarch64_minus_within_<optab><mode>3): Likewise. > >> * config/aarch64/iterators.md (ovf_add_cmp): New code attribute. > >> (ovf_sub_cmp): Likewise. > >> (ovf_commutate): New iterator. > >> (ovf_comm_opp): New int attribute. > >> > >> gcc/testsuite/ChangeLog: > >> > >> * gcc.target/aarch64/pr116815-1.c: New test. > >> * gcc.target/aarch64/pr116815-2.c: Likewise. > >> * gcc.target/aarch64/pr116815-3.c: Likewise. > >> * gcc.target/aarch64/pr116815-4.c: Likewise. > >> * gcc.target/aarch64/pr116815-5.c: Likewise. > >> * gcc.target/aarch64/pr116815-6.c: Likewise. > >> --- > >> gcc/config/aarch64/aarch64.md | 76 +++++++++++ > >> gcc/config/aarch64/iterators.md | 9 ++ > >> gcc/testsuite/gcc.target/aarch64/pr116815-1.c | 120 ++++++++++++++++++ > >> gcc/testsuite/gcc.target/aarch64/pr116815-2.c | 93 ++++++++++++++ > >> gcc/testsuite/gcc.target/aarch64/pr116815-3.c | 62 +++++++++ > >> gcc/testsuite/gcc.target/aarch64/pr116815-4.c | 48 +++++++ > >> gcc/testsuite/gcc.target/aarch64/pr116815-5.c | 44 +++++++ > >> gcc/testsuite/gcc.target/aarch64/pr116815-6.c | 60 +++++++++ > >> 8 files changed, 512 insertions(+) > >> create mode 100644 gcc/testsuite/gcc.target/aarch64/pr116815-1.c > >> create mode 100644 gcc/testsuite/gcc.target/aarch64/pr116815-2.c > >> create mode 100644 gcc/testsuite/gcc.target/aarch64/pr116815-3.c > >> create mode 100644 gcc/testsuite/gcc.target/aarch64/pr116815-4.c > >> create mode 100644 gcc/testsuite/gcc.target/aarch64/pr116815-5.c > >> create mode 100644 gcc/testsuite/gcc.target/aarch64/pr116815-6.c > >> > >> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md > >> index 8e431a10cb1..a55fe5c85d8 100644 > >> --- a/gcc/config/aarch64/aarch64.md > >> +++ b/gcc/config/aarch64/aarch64.md > >> @@ -3715,6 +3715,20 @@ > >> [(set_attr "type" "alus_sreg")] > >> ) > >> > >> +;; An equivalent to add<mode>3_compareC > >> +(define_insn "sub<mode>3_compareC" > >> + [(set (reg:CC_C CC_REGNUM) > >> + (compare:CC_C > >> + (minus:GPI > >> + (match_operand:GPI 1 "register_operand" "r") > >> + (match_operand:GPI 2 "register_operand" "r")) > >> + (match_dup 1))) > >> + (set (match_operand:GPI 0 "register_operand" "=r") > >> + (minus:GPI (match_dup 1) (match_dup 2)))] > >> + "" > >> + "subs\t%<w>0, %<w>1, %<w>2" > >> +) > >> + > > > > I don't think we want/need this pattern. subs(a,b) underflows precisely > > when a < b, i.e. when a - b < 0 so you can just use a plain CCmode > > comparison. I think you can make use of the existing > > sub<mode>3_compare1 pattern here. > > > > This should simplify the define_insn_and_split for the subs case below. > > > >> (define_peephole2 > >> [(set (match_operand:GPI 0 "aarch64_general_reg") > >> (minus:GPI (match_operand:GPI 1 "aarch64_reg_or_zero") > >> @@ -4455,6 +4469,68 @@ > >> [(set_attr "type" "<su>div")] > >> ) > >> > >> +;; umax (a, add (a, b)) => [sum, ovf] = adds (a, b); !ovf ? sum : a > >> +;; umin (a, add (a, b)) => [sum, ovf] = adds (a, b); !ovf ? a : sum > >> +;; ... along with the commutative version of add (a, b) i.e. add (b, a). > > > > For the sake of saving one comment line, it might be better to actually > > spell out the commutated variant, AIUI e.g. for umax that's: > > umax (b, add (a, b)) => [sum, ovf] = adds (b, a); !ovf ? sum : b > > Done. I also added this to the patch description. > > > > >> +(define_insn_and_split > >> "*aarch64_plus_within_<optab><mode>3_<ovf_commutate>" > >> + [(set (match_operand:GPI 0 "register_operand" "=r") > >> + (UMAXMIN:GPI > >> + (plus:GPI (match_operand:GPI 1 "register_operand" "r") > >> + (match_operand:GPI 2 "register_operand" "r")) > >> + (match_dup ovf_commutate))) > >> + (clobber (match_scratch:GPI 3))] > > > > I think you might want an "=r" constraint on the match_scratch here. > > I am a bit confused here - I expect this pattern to only execute pre-RA > because it requires being able to call gen_reg_rtx (which only works > pre-RA). Does RA populate scratch registers by itself? Otherwise, I > don't really see any other way for this split to run post-RA. > > Uros: IIRC I ran into issues where unconditionally calling gen_reg_rtx > still caused an ICE, maybe because the code ran during reload itself. > I guess I could replace "!reload_completed" with "can_create_pseudo_p ()"?
Please see how *plus_within_<code><mode>3_<ovf_comm> and *minus_within_<code><mode>3 in config/i386/i386.md are implemented. To ensure the pattern is *really* split only before reload, you have to use: "TARGET_CMOVE && ix86_pre_reload_split ()" "#" "&& 1" aarch64 currently doesn't define aarch64_pre_reload_split, but it should. Please see the discussion in the mailing list archives [1] and related PR92140 [2]. [1] https://gcc.gnu.org/legacy-ml/gcc-patches/2019-10/msg01439.html [2] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92140#c11 Since this is a pre-reload split, you don't need any operand constraints. > > > > >> + "!TARGET_CSSC" > >> + "#" > >> + "&& !reload_completed" > >> + [(parallel > >> + [(set (reg:CC_C CC_REGNUM) > >> + (compare:CC_C (plus:GPI (match_dup ovf_commutate) > >> + (match_dup <ovf_comm_opp>)) > >> + (match_dup ovf_commutate))) > >> + (set (match_dup 3) (plus:GPI (match_dup ovf_commutate) > >> + (match_dup <ovf_comm_opp>)))]) > >> + (set (match_dup 0) > >> + (if_then_else:GPI (<ovf_add_cmp> (reg:CC_C CC_REGNUM) > >> + (const_int 0)) > >> + (match_dup 3) > >> + (match_dup ovf_commutate)))] > >> + { > >> + if (GET_CODE (operands[3]) == SCRATCH) > >> + operands[3] = gen_reg_rtx (<MODE>mode); > > > > At first I was a bit unsure about this idiom, since md.texi:9358 has > > this to say about define_split (and therefore about > > define_insn_and_split): > > > > The @var{preparation-statements} are similar to those statements that > > are specified for @code{define_expand} (@pxref{Expander Definitions}) > > and are executed before the new RTL is generated to prepare for the > > generated code or emit some insns whose pattern is not fixed. Unlike > > those in @code{define_expand}, however, these statements must not > > generate any new pseudo-registers. Once reload has completed, they > > also > > must not allocate any space in the stack frame. > > > > but I see that we already use this idiom for a few patterns in > > aarch64-sve.md. > > Would appreciate a second opinion on this. Either the existing SVE > > patterns are > > wrong or (perhaps more likely) the md.texi docs are out of date. Scratches are not needed for pre-reload split patterns, you can always use gen_reg_rtx to obtain new pseudo. Again, please see above mentioned patterns in i386.md and discussion in the PR. Uros.
