> -----Original Message----- > From: Richard Sandiford <richard.sandif...@arm.com> > Sent: Monday, June 9, 2025 2:13 PM > To: Tamar Christina <tamar.christ...@arm.com> > Cc: gcc-patches@gcc.gnu.org; nd <n...@arm.com>; Richard Earnshaw > <richard.earns...@arm.com>; ktkac...@gcc.gnu.org > Subject: Re: [PATCH 2/3]AArch64: Support eliding ptest on masked compares > [PR118974] > > Richard Sandiford <richard.sandif...@arm.com> writes: > > Tamar Christina <tamar.christ...@arm.com> writes: > >> In the example > >> > >> void f1 () > >> { > >> for (int i = 0; i < N; i++) > >> { > >> b[i] += a[i]; > >> if (a[i] > 0) > >> break; > >> } > >> } > >> > >> when compiled for SVE we generate: > >> > >> ld1w z28.s, p7/z, [x4, x0, lsl 2] > >> cmpgt p14.s, p7/z, z28.s, #0 > >> ptest p15, p14.b > >> b.none .L3 > >> > >> Where the ptest isn't needed since the branch only cares about the Z and NZ > >> flags. > >> > >> GCC Today supports eliding this through the pattern > *cmp<cmp_op><mode>_ptest > >> however this pattern only supports the removal when the outermost context > >> is > a > >> CMP where the predicate is inside the condition itself. > >> > >> This typically only happens for an unpredicated CMP as a ptrue will be > generated > >> during expand. > >> > >> In the case about at the GIMPLE level we have > >> > >> mask_patt_14.15_57 = vect__2.11_52 > { 0, ... }; > >> vec_mask_and_58 = loop_mask_48 & mask_patt_14.15_57; > >> if (vec_mask_and_58 != { 0, ... }) > >> goto <bb 5>; [5.50%] > >> else > >> goto <bb 6>; [94.50%] > >> > >> where the loop mask is applied to the compare as an AND. > >> > >> The loop mask is moved into the compare by the pattern > *cmp<cmp_op><mode>_and > >> which moves the mask inside if the current mask is a ptrue since > >> p && true -> p. > >> > >> However this happens after combine, and so we can't both move the predicate > >> inside AND eliminate the ptests. > >> > >> To fix this this patch adds a new pattern *cmp<cmp_op><mode>_and_ptest > which > >> combines these two patterns together allowing us to both push the predicate > >> inside and eliminate the ptest. > >> > >> After this patch we generate > >> > >> ld1w z28.s, p7/z, [x4, x0, lsl 2] > >> cmpgt p14.s, p7/z, z28.s, #0 > >> b.none .L3 > >> > >> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. > >> > >> Ok for master? > >> > >> Thanks, > >> Tamar > >> > >> gcc/ChangeLog: > >> > >> PR target/118974 > >> * config/aarch64/aarch64-sve.md (*cmp<cmp_op><mode>_and_ptest): > New. > >> > >> gcc/testsuite/ChangeLog: > >> > >> PR target/118974 > >> * gcc.target/aarch64/sve/pr119351.c: Update codegen. > >> * gcc.target/aarch64/sve/vect-early-break-cbranch.c: Likewise. > >> > >> --- > >> > >> diff --git a/gcc/config/aarch64/aarch64-sve.md > b/gcc/config/aarch64/aarch64-sve.md > >> index > bf7569f932b6d7392b9c4fb7b94efafb6fd184c2..fe7f52ee1ed400b4eda28e3f90 > edc0044a5aa7a9 100644 > >> --- a/gcc/config/aarch64/aarch64-sve.md > >> +++ b/gcc/config/aarch64/aarch64-sve.md > >> @@ -8319,6 +8319,40 @@ (define_insn_and_rewrite > "*cmp<cmp_op><mode>_ptest" > >> } > >> ) > >> > >> +;; Predicated integer comparisons, formed by combining a PTRUE-predicated > >> +;; comparison with an AND in which only the flags result is interesting. > >> +(define_insn_and_rewrite "*cmp<cmp_op><mode>_and_ptest" > >> + [(set (reg:CC_NZC CC_REGNUM) > >> + (unspec:CC_NZC > >> + [(match_operand:VNx16BI 1 "register_operand") > >> + (match_operand 4) > >> + (const_int SVE_KNOWN_PTRUE) > >> + (and:<VPRED> > >> + (unspec:<VPRED> > >> + [(match_operand 5) > >> + (const_int SVE_KNOWN_PTRUE) > >> + (SVE_INT_CMP:<VPRED> > >> + (match_operand:SVE_I 2 "register_operand") > >> + (match_operand:SVE_I 3 > "aarch64_sve_cmp_<sve_imm_con>_operand"))] > >> + UNSPEC_PRED_Z) > >> + (match_operand:<VPRED> 6 "register_operand"))] > >> + UNSPEC_PTEST)) > >> + (clobber (match_scratch:<VPRED> 0))] > >> + "TARGET_SVE" > >> + {@ [ cons: =0, 1 , 2 , 3 ; attrs: pred_clobber ] > >> + [ &Upa , Upl, w , <sve_imm_con>; yes ] > cmp<cmp_op>\t%0.<Vetype>, %6/z, %2.<Vetype>, #%3 > >> + [ ?Upl , 0 , w , <sve_imm_con>; yes ] ^ > >> + [ Upa , Upl, w , <sve_imm_con>; no ] ^ > >> + [ &Upa , Upl, w , w ; yes ] > >> cmp<cmp_op>\t%0.<Vetype>, > %6/z, %2.<Vetype>, %3.<Vetype> > >> + [ ?Upl , 0 , w , w ; yes ] ^ > >> + [ Upa , Upl, w , w ; no ] ^ > >> + } > >> + "&& !rtx_equal_p (operands[4], operands[5])" > >> + { > >> + operands[5] = copy_rtx (operands[4]); > >> + } > >> +) > >> + > > > > This doesn't look like a legitimate fold, since the LAST flag from the > > original PTEST will apply to the last bit of the predicate, which for a > > .H, .S, or .D comparison will be zero. The fused compare will instead > > set the LAST flag based on the last full SVE_I element. > > > > For example, for a .D comparison on 128-bit SVE, we'd get (lsb first): > > > > a0000000 b0000000 > > > > where a and b are the comparison results. The ptest is testing this > > against a governing predicate of: > > > > 11111111 11111111 > > > > so LAST should always be zero. The fold above would instead set LAST to b. > > For the record, Tamar pointed out off-list that this example is bogus. > The SVE_KNOWN_PTRUE in the UNSPEC_PTEST guarantees that operand 4 is > a strict VPRED predicate (upper bits zero). So the real issue is > instead the intervening AND, which could remove initial and trailing > elements. The matched compare would ignore those ANDed off elements > (because the compare would treat them as inactive), whereas the > original PTEST would test the full VPRED predicate. > > So it is still invalid, just not for the reason that I said. > > Sorry for the confusion!
In addition we did discuss that the thing that if what you care about is Z==0 and Z==1 that the pattern would work, but I had used CC_NZC here instead of CC_Z which was my original intention. For cbranch we only care about zero and non-zero. So I will respin with that change. Tamar > > Richard > > > > > I think we need to differentiate users that care about LAST from > > those that don't, such as by using CC_Z for NE/EQ ANY/NONE comparisons. > > See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118151 for more details. > > > > Thanks, > > Richard > > > >> ;; Predicated integer comparisons, formed by combining a PTRUE-predicated > >> ;; comparison with an AND. Split the instruction into its preferred form > >> ;; at the earliest opportunity, in order to get rid of the redundant > >> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pr119351.c > b/gcc/testsuite/gcc.target/aarch64/sve/pr119351.c > >> index > 85aab355f95f83e1fa65d280f14fb8ade7f7e658..1ebc735a82f4a59d8eccff3934 > 6e46a449b4729a 100644 > >> --- a/gcc/testsuite/gcc.target/aarch64/sve/pr119351.c > >> +++ b/gcc/testsuite/gcc.target/aarch64/sve/pr119351.c > >> @@ -14,7 +14,6 @@ int x[N] __attribute__((aligned(32))); > >> ** ... > >> ** ld1w z[0-9]+.s, p[0-9]+/z, \[x[0-9], x[0-9], lsl 2\] > >> ** cmple p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, #0 > >> -** ptest p[0-9]+, p[0-9]+.b > >> ** ... > > > */ > >> > >> diff --git > >> a/gcc/testsuite/gcc.target/aarch64/sve/vect-early-break-cbranch.c > b/gcc/testsuite/gcc.target/aarch64/sve/vect-early-break-cbranch.c > >> index > d7cef1105410be04ed67d1d3b800746267f205a8..8bd6fafc4d4248cf0acf7dfa2f > 07cd005f13de35 100644 > >> --- a/gcc/testsuite/gcc.target/aarch64/sve/vect-early-break-cbranch.c > >> +++ b/gcc/testsuite/gcc.target/aarch64/sve/vect-early-break-cbranch.c > >> @@ -8,7 +8,6 @@ int b[N] = {0}; > >> ** f1: > >> ** ... > >> ** cmpgt p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, #0 > >> -** ptest p[0-9]+, p[0-9]+.b > >> ** b.(any|none) \.L[0-9]+ > >> ** ... > >> */ > >> @@ -25,7 +24,6 @@ void f1 () > >> ** f2: > >> ** ... > >> ** cmpge p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, #0 > >> -** ptest p[0-9]+, p[0-9]+.b > >> ** b.(any|none) \.L[0-9]+ > >> ** ... > >> */ > >> @@ -42,7 +40,6 @@ void f2 () > >> ** f3: > >> ** ... > >> ** cmpeq p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, #0 > >> -** ptest p[0-9]+, p[0-9]+.b > >> ** b.(any|none) \.L[0-9]+ > >> ** ... > >> */ > >> @@ -59,7 +56,6 @@ void f3 () > >> ** f4: > >> ** ... > >> ** cmpne p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, #0 > >> -** ptest p[0-9]+, p[0-9]+.b > >> ** b.(any|none) \.L[0-9]+ > >> ** ... > >> */ > >> @@ -76,7 +72,6 @@ void f4 () > >> ** f5: > >> ** ... > >> ** cmplt p[0-9]+.s, p7/z, z[0-9]+.s, #0 > >> -** ptest p[0-9]+, p[0-9]+.b > >> ** b.(any|none) .L[0-9]+ > >> ** ... > >> */ > >> @@ -93,7 +88,6 @@ void f5 () > >> ** f6: > >> ** ... > >> ** cmple p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, #0 > >> -** ptest p[0-9]+, p[0-9]+.b > >> ** b.(any|none) \.L[0-9]+ > >> ** ... > >> */