(on mobile so doing a top reply) > So it isn't as efficient to use cbranch_any (g != 0) here? I think it > should be practically equivalent...
Ah yeah, it can expand what we currently expand vector boolean to. I was initially confused because for SVE what we want here is an ORRS (flag setting inclusive ORR) Using this optab we can get to that an easier way too. So yeah I agree, cbranch for vectors can be deprecated. Note that in my patch I named the new one vec_cbranch_any/all to implicitly say they are only vectors. Do you want to fully deprecated cbranch for vectors? This would mean though that all target checks needs to be updated unless we update the supports checks with a helper? Thanks, Tamar ________________________________ From: Richard Biener <rguent...@suse.de> Sent: Wednesday, July 9, 2025 1:24 PM To: Tamar Christina <tamar.christ...@arm.com> Cc: gcc-patches@gcc.gnu.org <gcc-patches@gcc.gnu.org>; nd <n...@arm.com> Subject: RE: [PATCH 1/3]middle-end: support vec_cbranch_any and vec_cbranch_all [PR118974] On Wed, 9 Jul 2025, Tamar Christina wrote: > > -----Original Message----- > > From: Richard Biener <rguent...@suse.de> > > Sent: Wednesday, July 9, 2025 12:36 PM > > To: Tamar Christina <tamar.christ...@arm.com> > > Cc: gcc-patches@gcc.gnu.org; nd <n...@arm.com> > > Subject: RE: [PATCH 1/3]middle-end: support vec_cbranch_any and > > vec_cbranch_all [PR118974] > > > > On Wed, 9 Jul 2025, Tamar Christina wrote: > > > > > > > +Operand 0 is a comparison operator. Operand 1 and operand 2 are the > > > > > +first and second operands of the comparison, respectively. Operand 3 > > > > > +is the @code{code_label} to jump to. > > > > > + > > > > > +@cindex @code{cbranch_all@var{mode}4} instruction pattern > > > > > +@item @samp{cbranch_all@var{mode}4} > > > > > +Conditional branch instruction combined with a compare instruction on > > vectors > > > > > +where it is required that at all of the elementwise comparisons of > > > > > the > > > > > +two input vectors are true. > > > > > > > > See above. > > > > > > > > When I look at the RTL for aarch64 I wonder whether the middle-end > > > > can still invert a jump (for BB reorder, for example)? Without > > > > a cbranch_none expander we have to invert during RTL expansion? > > > > > > > > > > Isn't cbranch_none just cbranch_all x 0? i.e. all value must be zero. > > > I think all states are expressible with any and all and flipping the > > > branches > > > so it shouldn't be any more restrictive than cbranch itself is today. > > > > > > cbranch also only supports eq and ne, so none would be cbranch (eq x 0) > > > > > > and FTR the RTL generated for AArch64 (Both SVE And Adv.SIMD) will be > > simplified to: > > > > > > (insn 23 22 24 5 (parallel [ > > > (set (reg:VNx4BI 128 [ mask_patt_14.15_57 ]) > > > (unspec:VNx4BI [ > > > (reg:VNx4BI 129) > > > (const_int 0 [0x0]) > > > (gt:VNx4BI (reg:VNx4SI 114 [ vect__2.11 ]) > > > (const_vector:VNx4SI repeat [ > > > (const_int 0 [0]) > > > ])) > > > ] UNSPEC_PRED_Z)) > > > (clobber (reg:CC_NZC 66 cc)) > > > ]) "cbranch.c":25:10 -1 > > > > > > (jump_insn 27 26 28 5 (set (pc) > > > (if_then_else (eq (reg:CC_Z 66 cc) > > > (const_int 0 [0])) > > > (label_ref 33) > > > (pc))) "cbranch.c":25:10 -1 > > > (int_list:REG_BR_PROB 1014686025 (nil)) > > > > > > The thing is we can't rid of the unspecs as there's concept of masking in > > > RTL > > compares. > > > We could technically do an AND (and do in some cases) but then you lose > > > the > > predicate > > > Hint constant in the RTL which tells you whether the mask is known to be > > > all true > > or not. > > > This hint is crucial to allow for further optimizations. > > > > > > That said the condition code, branch and compares are fully exposed. > > > > > > We expand to a larger sequence than I'd like mostly because there's no > > > support > > > for conditional cbranch optabs, or even conditional vector comparisons. > > > So the > > comparisons > > > must be generated unpredicated by generating an all true mask, and later > > patterns > > > merge in the AND. > > > > > > The new patterns allow us to clean up codegen for Adv.SIMD + SVE (in a > > > single > > loop) > > > But not pure SVE. For which I take a different approach to try to avoid > > > requiring > > > a predicated version of these optabs. > > > > > > I don't want to push my luck, but would you be ok with a conditional > > > version of > > these > > > optabs too? i.e. cond_cbranch_all and cond_cbranch_all? This would allow > > > us to > > > immediately expand to the correct representation for both SVE and Adv.SIMD > > > without having to rely on various combine patterns and cc-fusion to > > > optimize the > > sequences > > > later on (which has historically been a bit hit or miss if someone adds a > > > new CC > > pattern). > > > > Can you explain? A conditional conditional branch makes my head hurt. > > It's really a cbranch_{all,any} where the (vector) compare has an > > additional mask applied? So cbranch_cond_{all,any} would be a better > > fit? > > Yeah so cbranch is itself in GIMPLE > > c = vec_a `cmp` vec_b > if (c {any,all} 0) > ... > > where cbranch_{all, any} represents the gimple > > If (vec_a `cmp` vec_b) > ... > > cbranch_cond_{all, any} would represent > > if ((vec_a `cmp` vec_b) & loop_mask) > ... > > In GIMPLE we mask most operations by & the mask with the result > of the operation. But cbranch doesn't have an LHS, so we can't wrap > the & around anything. And because of this we rely on backend patterns > to later push the mask from the & into the operation such that we can > generate the predicated compare. OK, so we could already do c = .COND_`cmp` (vec_a, vec_b, loop_mask, { 0, 0... }); if (c {any,all} 0) but I can see how cond_cbranch is useful (besides chosing a proper name). > Because of this we end up requiring patterns such as > > ;; Predicated integer comparisons, formed by combining a PTRUE-predicated > ;; comparison with an AND in which only the flags result is interesting. > (define_insn_and_rewrite "*cmp<cmp_op><mode>_and_ptest" > [(set (reg:CC_Z CC_REGNUM) > (unspec:CC_Z > [(match_operand:VNx16BI 1 "register_operand") > (match_operand 4) > (const_int SVE_KNOWN_PTRUE) > (and:<VPRED> > (unspec:<VPRED> > [(match_operand 5) > (const_int SVE_KNOWN_PTRUE) > (SVE_INT_CMP:<VPRED> > (match_operand:SVE_I 2 "register_operand") > (match_operand:SVE_I 3 > "aarch64_sve_cmp_<sve_imm_con>_operand"))] > UNSPEC_PRED_Z) > (match_operand:<VPRED> 6 "register_operand"))] > UNSPEC_PTEST)) > (clobber (match_scratch:<VPRED> 0))] > "TARGET_SVE" > {@ [ cons: =0, 1 , 2 , 3 ; attrs: pred_clobber ] > [ &Upa , Upl, w , <sve_imm_con>; yes ] > cmp<cmp_op>\t%0.<Vetype>, %6/z, %2.<Vetype>, #%3 > [ ?Upl , 0 , w , <sve_imm_con>; yes ] ^ > [ Upa , Upl, w , <sve_imm_con>; no ] ^ > [ &Upa , Upl, w , w ; yes ] > cmp<cmp_op>\t%0.<Vetype>, %6/z, %2.<Vetype>, %3.<Vetype> > [ ?Upl , 0 , w , w ; yes ] ^ > [ Upa , Upl, w , w ; no ] ^ > } > "&& !rtx_equal_p (operands[4], operands[5])" > { > operands[5] = copy_rtx (operands[4]); > } > ) > > And rely on combine to eventually connect enough pieces back together. > With the mask cbranch patterns we can just emit a simplified RTL: > > (insn 23 22 24 5 (parallel [ > (set (reg:VNx4BI 128 [ mask_patt_14.15_57 ]) > (unspec:VNx4BI [ > (reg:VNx4BI 129) > (const_int 0 [0x0]) > (gt:VNx4BI (reg:VNx4SI 114 [ vect__2.11 ]) > (const_vector:VNx4SI repeat [ > (const_int 0 [0]) > ])) > ] UNSPEC_PRED_Z)) > (clobber (reg:CC_NZC 66 cc)) > ]) "cbranch.c":25:10 -1 > > Immediately out of expand. > > > Though 'cond_' elsewhere suggests there is an else value, instead > > cbranch_mask_{all,any}? Or would that not capture things exactly? > > Yeah mask is fine too, I only suggested cond since that's what we normally > used. > But mask works too. cond_cbranch didn't seem to imply the cond_ is applied to the compare, but from a 2nd look it might be OK after all. > > cbranch is compare-and-branch, so masked-compare-and-branch aka > > mcbranch[_{all,any}]? > > > > And writing this and not tryinig to remember everything said it > > appears that 'cbranch' itself (for vectors) becomes redundant? > > > > For the most part, but not when unrolling is involved where the compare > is done on the reduction of masks. i.e. for > > a = b > c > d = e > f > g = d | a > if (g != 0) > ... > > We can't really use the new optabs as `g` itself is not a comparison. So it isn't as efficient to use cbranch_any (g != 0) here? I think it should be practically equivalent... Richard. > For VLA though > this happens less often due to the support for widening loads and narrowing > stores. > > Thanks, > Tamar > > > Richard. > > > > > And the reason for both is that for Adv.SIMD there's no mask at GIMPLE > > > level and > > we have to > > > make it during expand. > > > > > > Thanks, > > > Tamar > > > > > > > -- > > Richard Biener <rguent...@suse.de> > > SUSE Software Solutions Germany GmbH, > > Frankenstrasse 146, 90461 Nuernberg, Germany; > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg) > -- Richard Biener <rguent...@suse.de> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)