RE: [PATCH 2/3]AArch64: Support eliding ptest on masked compares [PR118974]

Tamar Christina Mon, 09 Jun 2025 06:28:07 -0700

> -----Original Message-----
> From: Richard Sandiford <richard.sandif...@arm.com>
> Sent: Monday, June 9, 2025 2:13 PM
> To: Tamar Christina <tamar.christ...@arm.com>
> Cc: gcc-patches@gcc.gnu.org; nd <n...@arm.com>; Richard Earnshaw
> <richard.earns...@arm.com>; ktkac...@gcc.gnu.org
> Subject: Re: [PATCH 2/3]AArch64: Support eliding ptest on masked compares
> [PR118974]
> 
> Richard Sandiford <richard.sandif...@arm.com> writes:
> > Tamar Christina <tamar.christ...@arm.com> writes:
> >> In the example
> >>
> >> void f1 ()
> >> {
> >>   for (int i = 0; i < N; i++)
> >>     {
> >>       b[i] += a[i];
> >>       if (a[i] > 0)
> >>    break;
> >>     }
> >> }
> >>
> >> when compiled for SVE we generate:
> >>
> >>         ld1w    z28.s, p7/z, [x4, x0, lsl 2]
> >>         cmpgt   p14.s, p7/z, z28.s, #0
> >>         ptest   p15, p14.b
> >>         b.none  .L3
> >>
> >> Where the ptest isn't needed since the branch only cares about the Z and NZ
> >> flags.
> >>
> >> GCC Today supports eliding this through the pattern
> *cmp<cmp_op><mode>_ptest
> >> however this pattern only supports the removal when the outermost context 
> >> is
> a
> >> CMP where the predicate is inside the condition itself.
> >>
> >> This typically only happens for an unpredicated CMP as a ptrue will be
> generated
> >> during expand.
> >>
> >> In the case about at the GIMPLE level we have
> >>
> >>   mask_patt_14.15_57 = vect__2.11_52 > { 0, ... };
> >>   vec_mask_and_58 = loop_mask_48 & mask_patt_14.15_57;
> >>   if (vec_mask_and_58 != { 0, ... })
> >>     goto <bb 5>; [5.50%]
> >>   else
> >>     goto <bb 6>; [94.50%]
> >>
> >> where the loop mask is applied to the compare as an AND.
> >>
> >> The loop mask is moved into the compare by the pattern
> *cmp<cmp_op><mode>_and
> >> which moves the mask inside if the current mask is a ptrue since
> >> p && true -> p.
> >>
> >> However this happens after combine, and so we can't both move the predicate
> >> inside AND eliminate the ptests.
> >>
> >> To fix this this patch adds a new pattern *cmp<cmp_op><mode>_and_ptest
> which
> >> combines these two patterns together allowing us to both push the predicate
> >> inside and eliminate the ptest.
> >>
> >> After this patch we generate
> >>
> >>         ld1w    z28.s, p7/z, [x4, x0, lsl 2]
> >>         cmpgt   p14.s, p7/z, z28.s, #0
> >>         b.none  .L3
> >>
> >> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >>
> >> Ok for master?
> >>
> >> Thanks,
> >> Tamar
> >>
> >> gcc/ChangeLog:
> >>
> >>    PR target/118974
> >>    * config/aarch64/aarch64-sve.md (*cmp<cmp_op><mode>_and_ptest):
> New.
> >>
> >> gcc/testsuite/ChangeLog:
> >>
> >>    PR target/118974
> >>    * gcc.target/aarch64/sve/pr119351.c: Update codegen.
> >>    * gcc.target/aarch64/sve/vect-early-break-cbranch.c: Likewise.
> >>
> >> ---
> >>
> >> diff --git a/gcc/config/aarch64/aarch64-sve.md
> b/gcc/config/aarch64/aarch64-sve.md
> >> index
> bf7569f932b6d7392b9c4fb7b94efafb6fd184c2..fe7f52ee1ed400b4eda28e3f90
> edc0044a5aa7a9 100644
> >> --- a/gcc/config/aarch64/aarch64-sve.md
> >> +++ b/gcc/config/aarch64/aarch64-sve.md
> >> @@ -8319,6 +8319,40 @@ (define_insn_and_rewrite
> "*cmp<cmp_op><mode>_ptest"
> >>    }
> >>  )
> >>
> >> +;; Predicated integer comparisons, formed by combining a PTRUE-predicated
> >> +;; comparison with an AND in which only the flags result is interesting.
> >> +(define_insn_and_rewrite "*cmp<cmp_op><mode>_and_ptest"
> >> +  [(set (reg:CC_NZC CC_REGNUM)
> >> +  (unspec:CC_NZC
> >> +    [(match_operand:VNx16BI 1 "register_operand")
> >> +     (match_operand 4)
> >> +     (const_int SVE_KNOWN_PTRUE)
> >> +     (and:<VPRED>
> >> +       (unspec:<VPRED>
> >> +         [(match_operand 5)
> >> +          (const_int SVE_KNOWN_PTRUE)
> >> +          (SVE_INT_CMP:<VPRED>
> >> +            (match_operand:SVE_I 2 "register_operand")
> >> +            (match_operand:SVE_I 3
> "aarch64_sve_cmp_<sve_imm_con>_operand"))]
> >> +         UNSPEC_PRED_Z)
> >> +       (match_operand:<VPRED> 6 "register_operand"))]
> >> +    UNSPEC_PTEST))
> >> +   (clobber (match_scratch:<VPRED> 0))]
> >> +  "TARGET_SVE"
> >> +  {@ [ cons: =0, 1    , 2 , 3            ; attrs: pred_clobber ]
> >> +     [ &Upa    ,  Upl, w , <sve_imm_con>; yes                 ]
> cmp<cmp_op>\t%0.<Vetype>, %6/z, %2.<Vetype>, #%3
> >> +     [ ?Upl    ,  0  , w , <sve_imm_con>; yes                 ] ^
> >> +     [ Upa     ,  Upl, w , <sve_imm_con>; no                  ] ^
> >> +     [ &Upa    ,  Upl, w , w            ; yes                 ] 
> >> cmp<cmp_op>\t%0.<Vetype>,
> %6/z, %2.<Vetype>, %3.<Vetype>
> >> +     [ ?Upl    ,  0  , w , w            ; yes                 ] ^
> >> +     [ Upa     ,  Upl, w , w            ; no                  ] ^
> >> +  }
> >> +  "&& !rtx_equal_p (operands[4], operands[5])"
> >> +  {
> >> +    operands[5] = copy_rtx (operands[4]);
> >> +  }
> >> +)
> >> +
> >
> > This doesn't look like a legitimate fold, since the LAST flag from the
> > original PTEST will apply to the last bit of the predicate, which for a
> > .H, .S, or .D comparison will be zero.  The fused compare will instead
> > set the LAST flag based on the last full SVE_I element.
> >
> > For example, for a .D comparison on 128-bit SVE, we'd get (lsb first):
> >
> >    a0000000 b0000000
> >
> > where a and b are the comparison results.  The ptest is testing this
> > against a governing predicate of:
> >
> >    11111111 11111111
> >
> > so LAST should always be zero.  The fold above would instead set LAST to b.
> 
> For the record, Tamar pointed out off-list that this example is bogus.
> The SVE_KNOWN_PTRUE in the UNSPEC_PTEST guarantees that operand 4 is
> a strict VPRED predicate (upper bits zero).  So the real issue is
> instead the intervening AND, which could remove initial and trailing
> elements.  The matched compare would ignore those ANDed off elements
> (because the compare would treat them as inactive), whereas the
> original PTEST would test the full VPRED predicate.
> 
> So it is still invalid, just not for the reason that I said.
> 
> Sorry for the confusion!


In addition we did discuss that the thing that if what you care about is Z==0
and Z==1 that the pattern would work, but I had used CC_NZC here instead of
CC_Z which was my original intention.  For cbranch we only care about zero
and non-zero.

So I will respin with that change.

Tamar

> 
> Richard
> 
> >
> > I think we need to differentiate users that care about LAST from
> > those that don't, such as by using CC_Z for NE/EQ ANY/NONE comparisons.
> > See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118151 for more details.
> >
> > Thanks,
> > Richard
> >
> >>  ;; Predicated integer comparisons, formed by combining a PTRUE-predicated
> >>  ;; comparison with an AND.  Split the instruction into its preferred form
> >>  ;; at the earliest opportunity, in order to get rid of the redundant
> >> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pr119351.c
> b/gcc/testsuite/gcc.target/aarch64/sve/pr119351.c
> >> index
> 85aab355f95f83e1fa65d280f14fb8ade7f7e658..1ebc735a82f4a59d8eccff3934
> 6e46a449b4729a 100644
> >> --- a/gcc/testsuite/gcc.target/aarch64/sve/pr119351.c
> >> +++ b/gcc/testsuite/gcc.target/aarch64/sve/pr119351.c
> >> @@ -14,7 +14,6 @@ int x[N] __attribute__((aligned(32)));
> >>  **        ...
> >>  **        ld1w    z[0-9]+.s, p[0-9]+/z, \[x[0-9], x[0-9], lsl 2\]
> >>  **        cmple   p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, #0
> >> -**        ptest   p[0-9]+, p[0-9]+.b
> >>  **        ...
> >  >  */
> >>
> >> diff --git 
> >> a/gcc/testsuite/gcc.target/aarch64/sve/vect-early-break-cbranch.c
> b/gcc/testsuite/gcc.target/aarch64/sve/vect-early-break-cbranch.c
> >> index
> d7cef1105410be04ed67d1d3b800746267f205a8..8bd6fafc4d4248cf0acf7dfa2f
> 07cd005f13de35 100644
> >> --- a/gcc/testsuite/gcc.target/aarch64/sve/vect-early-break-cbranch.c
> >> +++ b/gcc/testsuite/gcc.target/aarch64/sve/vect-early-break-cbranch.c
> >> @@ -8,7 +8,6 @@ int b[N] = {0};
> >>  ** f1:
> >>  **        ...
> >>  **        cmpgt   p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, #0
> >> -**        ptest   p[0-9]+, p[0-9]+.b
> >>  **        b.(any|none)    \.L[0-9]+
> >>  **        ...
> >>  */
> >> @@ -25,7 +24,6 @@ void f1 ()
> >>  ** f2:
> >>  **        ...
> >>  **        cmpge   p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, #0
> >> -**        ptest   p[0-9]+, p[0-9]+.b
> >>  **        b.(any|none)    \.L[0-9]+
> >>  **        ...
> >>  */
> >> @@ -42,7 +40,6 @@ void f2 ()
> >>  ** f3:
> >>  **        ...
> >>  **        cmpeq   p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, #0
> >> -**        ptest   p[0-9]+, p[0-9]+.b
> >>  **        b.(any|none)    \.L[0-9]+
> >>  **        ...
> >>  */
> >> @@ -59,7 +56,6 @@ void f3 ()
> >>  ** f4:
> >>  **        ...
> >>  **        cmpne   p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, #0
> >> -**        ptest   p[0-9]+, p[0-9]+.b
> >>  **        b.(any|none)    \.L[0-9]+
> >>  **        ...
> >>  */
> >> @@ -76,7 +72,6 @@ void f4 ()
> >>  ** f5:
> >>  **        ...
> >>  **        cmplt   p[0-9]+.s, p7/z, z[0-9]+.s, #0
> >> -**        ptest   p[0-9]+, p[0-9]+.b
> >>  **        b.(any|none)    .L[0-9]+
> >>  **        ...
> >>  */
> >> @@ -93,7 +88,6 @@ void f5 ()
> >>  ** f6:
> >>  **        ...
> >>  **        cmple   p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, #0
> >> -**        ptest   p[0-9]+, p[0-9]+.b
> >>  **        b.(any|none)    \.L[0-9]+
> >>  **        ...
> >>  */

RE: [PATCH 2/3]AArch64: Support eliding ptest on masked compares [PR118974]

Reply via email to