Dhruv Chawla <dhr...@nvidia.com> writes:
> On 24/07/25 11:21, Andrew Pinski wrote:
>> External email: Use caution opening links or attachments
>> 
>> 
>> On Wed, Jul 23, 2025 at 10:16 PM <dhr...@nvidia.com> wrote:
>>>
>>> From: Dhruv Chawla <dhr...@nvidia.com>
>>>
>>> This patch folds the following patterns:
>>> - max (a, add (a, b)) -> [sum, ovf] = adds (a, b); !ovf ? sum : a
>>> - min (a, add (a, b)) -> [sum, ovf] = adds (a, b); !ovf ? a : sum
>>> - max (a, sub (a, b)) -> [sum, ovf] = subs (a, b); !ovf ? a : sum
>>> - min (a, sub (a, b)) -> [sum, ovf] = subs (a, b); !ovf ? sum : a
>>>
>>> Where ovf is the overflow flag. adds and subs are generated by
>>> generating a parallel compare+plus/minus which maps to the pattern
>>> add<mode>3_compareC. sub<mode>3_compareC is also created to have an
>>> equivalent pattern for the subs instruction.
>>>
>>> This patch is a respin of the patch posted at
>>> https://gcc.gnu.org/pipermail/gcc-patches/2025-May/685021.html as per
>>> the suggestion to turn it into a target-specific transform by Richard
>>> Biener.
>>>
>>> Bootstrapped and regtested on aarch64-unknown-linux-gnu.
>>>
>>> Signed-off-by: Dhruv Chawla <dhr...@nvidia.com>
>>>
>>>          PR middle-end/116815
>>>
>>> gcc/ChangeLog:
>>>
>>>          * config/aarch64/aarch64.md (sub<mode>3_compareC): New pattern.
>>>          (*aarch64_plus_within_<optab><mode>3_<ovf_commutate>): Likewise.
>>>          (*aarch64_minus_within_<optab><mode>3): Likewise.
>>>          * config/aarch64/iterators.md (ovf_add_cmp): New code attribute.
>>>          (ovf_sub_cmp): Likewise.
>>>          (ovf_commutate): New iterator.
>>>          (ovf_comm_opp): New int attribute.
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>>          * gcc.target/aarch64/pr116815-1.c: New test.
>>>          * gcc.target/aarch64/pr116815-2.c: Likewise.
>>>          * gcc.target/aarch64/pr116815-3.c: Likewise.
>>>          * gcc.target/aarch64/pr116815-4.c: Likewise.
>>>          * gcc.target/aarch64/pr116815-5.c: Likewise.
>>> ---
>>>   gcc/config/aarch64/aarch64.md                 |  73 +++++++++++
>>>   gcc/config/aarch64/iterators.md               |   9 ++
>>>   gcc/testsuite/gcc.target/aarch64/pr116815-1.c | 119 ++++++++++++++++++
>>>   gcc/testsuite/gcc.target/aarch64/pr116815-2.c |  94 ++++++++++++++
>>>   gcc/testsuite/gcc.target/aarch64/pr116815-3.c |  63 ++++++++++
>>>   gcc/testsuite/gcc.target/aarch64/pr116815-4.c |  49 ++++++++
>>>   gcc/testsuite/gcc.target/aarch64/pr116815-5.c |  45 +++++++
>>>   7 files changed, 452 insertions(+)
>>>   create mode 100644 gcc/testsuite/gcc.target/aarch64/pr116815-1.c
>>>   create mode 100644 gcc/testsuite/gcc.target/aarch64/pr116815-2.c
>>>   create mode 100644 gcc/testsuite/gcc.target/aarch64/pr116815-3.c
>>>   create mode 100644 gcc/testsuite/gcc.target/aarch64/pr116815-4.c
>>>   create mode 100644 gcc/testsuite/gcc.target/aarch64/pr116815-5.c
>>>
>>> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
>>> index a4ae6859da0..c9f88c40473 100644
>>> --- a/gcc/config/aarch64/aarch64.md
>>> +++ b/gcc/config/aarch64/aarch64.md
>>> @@ -3741,6 +3741,20 @@
>>>     [(set_attr "type" "alus_sreg")]
>>>   )
>>>
>>> +;; An equivalent to add<mode>3_compareC
>>> +(define_insn "sub<mode>3_compareC"
>>> +  [(set (reg:CC_C CC_REGNUM)
>>> +       (compare:CC_C
>>> +         (minus:GPI
>>> +           (match_operand:GPI 1 "register_operand" "r")
>>> +           (match_operand:GPI 2 "register_operand" "r"))
>>> +         (match_dup 1)))
>>> +   (set (match_operand:GPI 0 "register_operand" "=r")
>>> +       (minus:GPI (match_dup 1) (match_dup 2)))]
>>> +  ""
>>> +  "subs\t%<w>0, %<w>1, %<w>2"
>>> +)
>>> +

I think I'm going to campaign for renaming CC_C to CC_C_INV.  I spent a
while trying to understand how this pattern could be correct, before
realising that CC_C uses the opposite mapping from CC.  The mapping is:

CC:    RTL GEU == AArch64 HS == AArch64 CS  (good)
CC_C:  RTL GEU == AArch64 LO == AArch64 CC

I think the later patterns get caught by this.

>>>   (define_peephole2
>>>     [(set (match_operand:GPI 0 "aarch64_general_reg")
>>>          (minus:GPI (match_operand:GPI 1 "aarch64_reg_or_zero")
>>> @@ -4481,6 +4495,65 @@
>>>     [(set_attr "type" "<su>div")]
>>>   )
>>>
>>> +;; umax (a, add (a, b)) => [sum, ovf] = adds (a, b); !ovf ? sum : a
>>> +;; umin (a, add (a, b)) => [sum, ovf] = adds (a, b); !ovf ? a : sum
>>> +;; ... along with the commutative version of add (a, b) i.e. add (b, a)
>>> +(define_insn_and_split 
>>> "*aarch64_plus_within_<optab><mode>3_<ovf_commutate>"
>>> +  [(set (match_operand:GPI 0 "register_operand")
>>> +       (UMAXMIN:GPI
>>> +         (plus:GPI (match_operand:GPI 1 "register_operand")
>>> +                   (match_operand:GPI 2 "register_operand"))
>>> +         (match_dup <ovf_commutate>)))
>>> +   (clobber (match_scratch:GPI 3))]
>>> +  "!TARGET_CSSC"
>>> +  "#"
>>> +  "&& !reload_completed"
>> 
>> Since this is a define_insn_and_split, I think there should be
>> constraints and not just predicates on it. I don't think you can
>> depend on it being split before RA (though I could be wrong) or
>> matching post RA.

Agreed.

>>> +  [(parallel
>>> +      [(set (reg:CC_C CC_REGNUM)
>>> +           (compare:CC_C (plus:GPI (match_dup ovf_commutate)
>> 
>> I am thinking you are missing "<>" around ovf_commutate here and other
>> places below. Or did I misunderstand how this works?
>
> Hi,
>
> Yeah, this is a bit weird. It looks like the split part of a pattern does
> not require the <>. Infact, I was getting undefined iterator errors when
> I put them. Without them, the md gets correctly generated (I verified using
> "make mddump" in build/gcc/).

Yeah, since it's an iterator, using it without "<" and ">" is correct.
I think it's the (match_dup <ovf_commutate>) on the define_insn part
that's wrong -- it should be (match_dup ovf_commutate) too.
I'll look into why the parsers accept it.

>>> +                                   (match_dup <ovf_comm_opp>))
>>> +                         (match_dup ovf_commutate)))
>>> +       (set (match_dup 3) (plus:GPI (match_dup ovf_commutate)
>>> +                                   (match_dup <ovf_comm_opp>)))])
>>> +   (set (match_dup 0)
>>> +       (if_then_else:GPI (<ovf_add_cmp> (reg:CC CC_REGNUM)
>>> +                                           (const_int 0))
>>> +                         (match_dup 3)
>>> +                         (match_dup ovf_commutate)))]

This sets the carry flag in CC_Cmode but tests it in CCmode, with:

  (define_code_attr ovf_add_cmp [(umax "geu") (umin "ltu")])

So UMAX uses GEU, which for CC maps to the natural HS/CS, i.e. to
ovf rather than !ovf.  So for UMAX, this chooses the sum on overflow
and the input otherwise.  Therefore:

> +/*
> +** umaxadd1:
> +**   adds    (w[0-9]+), w0, w1
> +**   csel    w0, \1, w0, cs
> +**   ret
> +*/
> +OPERATION (max, add, 1, a, a + b)

...I don't think this is correct.  It chooses the addition result if
the ADD overflows and chooses a otherwise.  For example, after the patch:

[[gnu::noipa]] unsigned
foo (unsigned a, unsigned b)
{
  return a + b > b ? a + b : b;
}

int
main ()
{
  __builtin_printf ("%x\n", foo (0x90000000, 0xa0000000));
}

prints 30000000 when compiled at -O2 but a0000000 when compiled at -O0.

Thanks,
Richard

Reply via email to