sub(a, b)) [PR116815]

Uros Bizjak Tue, 04 Nov 2025 23:32:35 -0800

On Wed, Nov 5, 2025 at 6:09 AM Dhruv Chawla <[email protected]> wrote:
>
> On 28/10/25 22:03, Alex Coplan wrote:
> > External email: Use caution opening links or attachments
> >
> >
> > Hi Dhruv,
> >
> > Sorry for the long wait on this one.  Comments below ...
>
> Hi Alex,
>
> Thanks for the review. I have attached a new version of the patch
> to this email.
>
> >
> > On 18/08/2025 21:31, [email protected] wrote:
> >> From: Dhruv Chawla <[email protected]>
> >>
> >> This patch folds the following patterns:
> >> - umax (a, add (a, b)) -> [sum,  ovf] = adds (a, b); !ovf ? sum : a
> >> - umin (a, add (a, b)) -> [sum,  ovf] = adds (a, b); !ovf ? a : sum
> >> - umax (a, sub (a, b)) -> [diff, ovf] = subs (a, b); ovf ? diff : a
> >> - umin (a, sub (a, b)) -> [diff, ovf] = subs (a, b); ovf ? a : diff
> >>
> >> Where ovf is the overflow flag. adds and subs are generated by
> >> generating a parallel compare+plus/minus which maps to the pattern
> >> add<mode>3_compareC. sub<mode>3_compareC is also created to have an
> >> equivalent pattern for the subs instruction. In the case of subs, the
> >> overflow flag represents underflow.
> >>
> >> This patch is a respin of the patch posted at
> >> https://gcc.gnu.org/pipermail/gcc-patches/2025-May/685021.html as per
> >> the suggestion to turn it into a target-specific transform by Richard
> >> Biener.
> >>
> >> Bootstrapped and regtested on aarch64-unknown-linux-gnu.
> >>
> >> Signed-off-by: Dhruv Chawla <[email protected]>
> >>
> >>        PR middle-end/116815
> >>
> >> gcc/ChangeLog:
> >>
> >>        * config/aarch64/aarch64.md (sub<mode>3_compareC): New pattern.
> >>        (*aarch64_plus_within_<optab><mode>3_<ovf_commutate>): Likewise.
> >>        (*aarch64_minus_within_<optab><mode>3): Likewise.
> >>        * config/aarch64/iterators.md (ovf_add_cmp): New code attribute.
> >>        (ovf_sub_cmp): Likewise.
> >>        (ovf_commutate): New iterator.
> >>        (ovf_comm_opp): New int attribute.
> >>
> >> gcc/testsuite/ChangeLog:
> >>
> >>        * gcc.target/aarch64/pr116815-1.c: New test.
> >>        * gcc.target/aarch64/pr116815-2.c: Likewise.
> >>        * gcc.target/aarch64/pr116815-3.c: Likewise.
> >>        * gcc.target/aarch64/pr116815-4.c: Likewise.
> >>        * gcc.target/aarch64/pr116815-5.c: Likewise.
> >>        * gcc.target/aarch64/pr116815-6.c: Likewise.
> >> ---
> >>   gcc/config/aarch64/aarch64.md                 |  76 +++++++++++
> >>   gcc/config/aarch64/iterators.md               |   9 ++
> >>   gcc/testsuite/gcc.target/aarch64/pr116815-1.c | 120 ++++++++++++++++++
> >>   gcc/testsuite/gcc.target/aarch64/pr116815-2.c |  93 ++++++++++++++
> >>   gcc/testsuite/gcc.target/aarch64/pr116815-3.c |  62 +++++++++
> >>   gcc/testsuite/gcc.target/aarch64/pr116815-4.c |  48 +++++++
> >>   gcc/testsuite/gcc.target/aarch64/pr116815-5.c |  44 +++++++
> >>   gcc/testsuite/gcc.target/aarch64/pr116815-6.c |  60 +++++++++
> >>   8 files changed, 512 insertions(+)
> >>   create mode 100644 gcc/testsuite/gcc.target/aarch64/pr116815-1.c
> >>   create mode 100644 gcc/testsuite/gcc.target/aarch64/pr116815-2.c
> >>   create mode 100644 gcc/testsuite/gcc.target/aarch64/pr116815-3.c
> >>   create mode 100644 gcc/testsuite/gcc.target/aarch64/pr116815-4.c
> >>   create mode 100644 gcc/testsuite/gcc.target/aarch64/pr116815-5.c
> >>   create mode 100644 gcc/testsuite/gcc.target/aarch64/pr116815-6.c
> >>
> >> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> >> index 8e431a10cb1..a55fe5c85d8 100644
> >> --- a/gcc/config/aarch64/aarch64.md
> >> +++ b/gcc/config/aarch64/aarch64.md
> >> @@ -3715,6 +3715,20 @@
> >>     [(set_attr "type" "alus_sreg")]
> >>   )
> >>
> >> +;; An equivalent to add<mode>3_compareC
> >> +(define_insn "sub<mode>3_compareC"
> >> +  [(set (reg:CC_C CC_REGNUM)
> >> +     (compare:CC_C
> >> +       (minus:GPI
> >> +         (match_operand:GPI 1 "register_operand" "r")
> >> +         (match_operand:GPI 2 "register_operand" "r"))
> >> +       (match_dup 1)))
> >> +   (set (match_operand:GPI 0 "register_operand" "=r")
> >> +     (minus:GPI (match_dup 1) (match_dup 2)))]
> >> +  ""
> >> +  "subs\t%<w>0, %<w>1, %<w>2"
> >> +)
> >> +
> >
> > I don't think we want/need this pattern.  subs(a,b) underflows precisely
> > when a < b, i.e. when a - b < 0 so you can just use a plain CCmode
> > comparison.  I think you can make use of the existing
> > sub<mode>3_compare1 pattern here.
> >
> > This should simplify the define_insn_and_split for the subs case below.
> >
> >>   (define_peephole2
> >>     [(set (match_operand:GPI 0 "aarch64_general_reg")
> >>        (minus:GPI (match_operand:GPI 1 "aarch64_reg_or_zero")
> >> @@ -4455,6 +4469,68 @@
> >>     [(set_attr "type" "<su>div")]
> >>   )
> >>
> >> +;; umax (a, add (a, b)) => [sum, ovf] = adds (a, b); !ovf ? sum : a
> >> +;; umin (a, add (a, b)) => [sum, ovf] = adds (a, b); !ovf ? a : sum
> >> +;; ... along with the commutative version of add (a, b) i.e. add (b, a).
> >
> > For the sake of saving one comment line, it might be better to actually
> > spell out the commutated variant, AIUI e.g. for umax that's:
> > umax (b, add (a, b)) => [sum, ovf] = adds (b, a); !ovf ? sum : b
>
> Done. I also added this to the patch description.
>
> >
> >> +(define_insn_and_split 
> >> "*aarch64_plus_within_<optab><mode>3_<ovf_commutate>"
> >> +  [(set (match_operand:GPI 0 "register_operand" "=r")
> >> +     (UMAXMIN:GPI
> >> +       (plus:GPI (match_operand:GPI 1 "register_operand" "r")
> >> +                 (match_operand:GPI 2 "register_operand" "r"))
> >> +       (match_dup ovf_commutate)))
> >> +   (clobber (match_scratch:GPI 3))]
> >
> > I think you might want an "=r" constraint on the match_scratch here.
>
> I am a bit confused here - I expect this pattern to only execute pre-RA
> because it requires being able to call gen_reg_rtx (which only works
> pre-RA). Does RA populate scratch registers by itself? Otherwise, I
> don't really see any other way for this split to run post-RA.
>
> Uros: IIRC I ran into issues where unconditionally calling gen_reg_rtx
> still caused an ICE, maybe because the code ran during reload itself.
> I guess I could replace "!reload_completed" with "can_create_pseudo_p ()"?


Please see how *plus_within_<code><mode>3_<ovf_comm> and
*minus_within_<code><mode>3 in config/i386/i386.md are implemented. To
ensure the pattern is *really* split only before reload, you have to
use:

  "TARGET_CMOVE
   && ix86_pre_reload_split ()"
  "#"
  "&& 1"

aarch64 currently doesn't define aarch64_pre_reload_split, but it
should. Please see the discussion in the mailing list archives [1] and
related PR92140 [2].

[1] https://gcc.gnu.org/legacy-ml/gcc-patches/2019-10/msg01439.html
[2] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92140#c11

Since this is a pre-reload split, you don't need any operand constraints.

>
> >
> >> +  "!TARGET_CSSC"
> >> +  "#"
> >> +  "&& !reload_completed"
> >> +  [(parallel
> >> +      [(set (reg:CC_C CC_REGNUM)
> >> +         (compare:CC_C (plus:GPI (match_dup ovf_commutate)
> >> +                                 (match_dup <ovf_comm_opp>))
> >> +                       (match_dup ovf_commutate)))
> >> +       (set (match_dup 3) (plus:GPI (match_dup ovf_commutate)
> >> +                                 (match_dup <ovf_comm_opp>)))])
> >> +   (set (match_dup 0)
> >> +     (if_then_else:GPI (<ovf_add_cmp> (reg:CC_C CC_REGNUM)
> >> +                                      (const_int 0))
> >> +                       (match_dup 3)
> >> +                       (match_dup ovf_commutate)))]
> >> +  {
> >> +    if (GET_CODE (operands[3]) == SCRATCH)
> >> +      operands[3] = gen_reg_rtx (<MODE>mode);
> >
> > At first I was a bit unsure about this idiom, since md.texi:9358 has
> > this to say about define_split (and therefore about
> > define_insn_and_split):
> >
> >      The @var{preparation-statements} are similar to those statements that
> >      are specified for @code{define_expand} (@pxref{Expander Definitions})
> >      and are executed before the new RTL is generated to prepare for the
> >      generated code or emit some insns whose pattern is not fixed.  Unlike
> >      those in @code{define_expand}, however, these statements must not
> >      generate any new pseudo-registers.  Once reload has completed, they 
> > also
> >      must not allocate any space in the stack frame.
> >
> > but I see that we already use this idiom for a few patterns in 
> > aarch64-sve.md.
> > Would appreciate a second opinion on this.  Either the existing SVE 
> > patterns are
> > wrong or (perhaps more likely) the md.texi docs are out of date.

Scratches are not needed for pre-reload split patterns, you can always
use gen_reg_rtx to obtain new pseudo. Again, please see above
mentioned patterns in i386.md and discussion in the PR.

Uros.

Re: [PATCH v5] [aarch64] Make better use of overflowing operations in max/min(a, add/sub(a, b)) [PR116815]

Reply via email to