> -----Original Message----- > From: Richard Biener <[email protected]> > Sent: 21 October 2025 10:27 > To: Tamar Christina <[email protected]> > Cc: [email protected]; nd <[email protected]>; [email protected] > Subject: RE: [PATCH 1/4][vect]: Add support for boolean reductions for VLA > > On Tue, 21 Oct 2025, Tamar Christina wrote: > > > > -----Original Message----- > > > From: Richard Biener <[email protected]> > > > Sent: 21 October 2025 08:52 > > > To: Tamar Christina <[email protected]> > > > Cc: [email protected]; nd <[email protected]>; [email protected] > > > Subject: Re: [PATCH 1/4][vect]: Add support for boolean reductions for > VLA > > > > > > On Tue, 21 Oct 2025, Tamar Christina wrote: > > > > > > > The support for the new boolean reduction optabs didn't quite work for > VLA > > > > because the code later on insists on the target still having a > > > > shift-and- > insert > > > > optab. > > > > > > > > This is however not needed if the target can do the reduction using the > new > > > > optabs. This change makes it an OR not and AND requirement to have > > > > shift-and-insert and the reduc sbool optabs. > > > > > > > > Bootstrapped Regtested on aarch64-none-linux-gnu, > > > > arm-none-linux-gnueabihf, x86_64-pc-linux-gnu > > > > -m32, -m64 and no issues > > > > > > > > Any objections for master? > > > > > > I wonder if the comment isn't still true - iff we have a neutral > > > {-1,-1...} and the start value is 0, don't we still do what is > > > written there? IIRC the testcases I added all have constant > > > start values, so the interesting case is with unknown start value. > > > > That's true, I can see in get_initial_defs_for_reduction for constant > > start values we just splat, which is what I expected. > > > > But in vect_transform_cycle_phi with only one initial value we also > > Just seem to splat and defer the checking of the initial value until > > after the reduction, > > > > /* Try to simplify the vector initialization by applying an > > adjustment after the reduction has been performed. This > > can also break a critical path but on the other hand > > requires to keep the initial value live across the loop. */ > > if (neutral_op > > && initial_values.length () == 1 > > && !VECT_REDUC_INFO_REUSED_ACCUMULATOR (reduc_info) > > && STMT_VINFO_DEF_TYPE (stmt_info) == vect_reduction_def > > && !operand_equal_p (neutral_op, initial_values[0])) > > > > So the interesting case is where we have an SLP tree with two unknown > > non-constant values. Though it's unclear to me if we can defer the check > > for a single value, why not do it for all, surely the scalar final > > reductions > > are still cheaper as scalar then element wise insertion in the prologue? > > The idea here is of course to avoid keeping the initial value life > across the loop. There's also !VECT_REDUC_INFO_REUSED_ACCUMULATOR > here. >
True, but they're of course scalar elements so you're less likely to have much register pressure issues. The issue is that the operations for SHL_AND_INSERT can't be done for predicates. At least not without round-tripping through data. > > > > > > That said, when doing the bool reduction stuff my general impression > > > was that all the checks done could have some refactoring to make it > > > more obvious what check applies to what cases we have. As you show > > > here, it's not always obvious. > > > > > > > So it looks like the check isn't needed for a single non-constant value. > > I wonder.. is VECT_REDUC_INFO_INITIAL_VALUES set during > > Vectorizable_reduction. From the code it looks like I need an additional > > check on SLP_TREE_LANES (slp_node) != 1? > > Maybe simply avoiding the initial value insertion when we cannot > insert it ... I don't think we put this into costing in any way > (we also do not compute whether we re-use an accumulator for the > epilog, we just set things up in a way that it could be, but all > during transform ...). Yeah though this relaxation came 6 years after this code was added to check ror shift and insert :) > > The initial_values.legnth () == 1 check is for SLP reductions where > we cannot perform the adjustment in the epilogue because we only > track the initial value of one reduction, not multiple ones as we'd > need for SLP reductions (and the "live" issue would become worse there). > > So, you'd need to check a SLP bool reduction (two bool accumulators). So I tried https://godbolt.org/z/zPEoaWPEv But it looks like for the single initial value case the code you added in a54aa75ab30eb1a176ceaded507113252df24878 for PR 115438 doesn't trigger. So sadly SLP_TREE_LANES (slp_node) != 1 doesn't work... So for now, I'm just testing diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index 6c242024971..0aa970ab338 100644 --- a/gcc/tree-vect-loop.cc +++ b/gcc/tree-vect-loop.cc @@ -7565,6 +7565,7 @@ vectorizable_reduction (loop_vec_info loop_vinfo, values into the low-numbered elements. */ if ((double_reduc || neutral_op) && !nunits_out.is_constant () + && !really_constant_p (vect_phi_initial_value (reduc_def_phi)) && !direct_internal_fn_supported_p (IFN_VEC_SHL_INSERT, vectype_out, OPTIMIZE_FOR_SPEED)) { Instead which is the case we know works, and it looks like we'll have to give boolean reduction handling some more TLC next year. Cheers, Tamar > > Richard. > > > > I'll write some examples and see. > > > > Thanks, > > Tamar > > > > > Richard. > > > > > > > Thanks, > > > > Tamar > > > > > > > > gcc/ChangeLog: > > > > > > > > * tree-vect-loop.cc (vectorizable_reduction): Don't require > > > > IFN_VEC_SHL_INSERT when using reduc sbool optabs. > > > > > > > > --- > > > > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc > > > > index > > > > 6c2420249718237c4f70720b2bd03d4951bd8a5d..ca9ff35c0bb8d8c4a161 > > > a922bae9c19028492b66 100644 > > > > --- a/gcc/tree-vect-loop.cc > > > > +++ b/gcc/tree-vect-loop.cc > > > > @@ -7565,6 +7565,7 @@ vectorizable_reduction (loop_vec_info > > > loop_vinfo, > > > > values into the low-numbered elements. */ > > > > if ((double_reduc || neutral_op) > > > > && !nunits_out.is_constant () > > > > + && (reduc_fn == IFN_LAST || !VECTOR_BOOLEAN_TYPE_P > > > (vectype_out)) > > > > && !direct_internal_fn_supported_p (IFN_VEC_SHL_INSERT, > > > > vectype_out, > > > OPTIMIZE_FOR_SPEED)) > > > > { > > > > > > > > > > > > > > > > > > -- > > > Richard Biener <[email protected]> > > > SUSE Software Solutions Germany GmbH, > > > Frankenstrasse 146, 90461 Nuernberg, Germany; > > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG > > > Nuernberg) > > > > -- > Richard Biener <[email protected]> > SUSE Software Solutions Germany GmbH, > Frankenstrasse 146, 90461 Nuernberg, Germany; > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG > Nuernberg)
