> -----Original Message-----
> From: Richard Biener <[email protected]>
> Sent: 21 October 2025 12:13
> To: Tamar Christina <[email protected]>
> Cc: [email protected]; nd <[email protected]>; [email protected]
> Subject: RE: [PATCH 1/4][vect]: Add support for boolean reductions for VLA
> 
> On Tue, 21 Oct 2025, Tamar Christina wrote:
> 
> > > -----Original Message-----
> > > From: Richard Biener <[email protected]>
> > > Sent: 21 October 2025 10:27
> > > To: Tamar Christina <[email protected]>
> > > Cc: [email protected]; nd <[email protected]>; [email protected]
> > > Subject: RE: [PATCH 1/4][vect]: Add support for boolean reductions for
> VLA
> > >
> > > On Tue, 21 Oct 2025, Tamar Christina wrote:
> > >
> > > > > -----Original Message-----
> > > > > From: Richard Biener <[email protected]>
> > > > > Sent: 21 October 2025 08:52
> > > > > To: Tamar Christina <[email protected]>
> > > > > Cc: [email protected]; nd <[email protected]>;
> [email protected]
> > > > > Subject: Re: [PATCH 1/4][vect]: Add support for boolean reductions for
> > > VLA
> > > > >
> > > > > On Tue, 21 Oct 2025, Tamar Christina wrote:
> > > > >
> > > > > > The support for the new boolean reduction optabs didn't quite work
> for
> > > VLA
> > > > > > because the code later on insists on the target still having a 
> > > > > > shift-and-
> > > insert
> > > > > > optab.
> > > > > >
> > > > > > This is however not needed if the target can do the reduction using
> the
> > > new
> > > > > > optabs.  This change makes it an OR not and AND requirement to have
> > > > > > shift-and-insert and the reduc sbool optabs.
> > > > > >
> > > > > > Bootstrapped Regtested on aarch64-none-linux-gnu,
> > > > > > arm-none-linux-gnueabihf, x86_64-pc-linux-gnu
> > > > > > -m32, -m64 and no issues
> > > > > >
> > > > > > Any objections for master?
> > > > >
> > > > > I wonder if the comment isn't still true - iff we have a neutral
> > > > > {-1,-1...} and the start value is 0, don't we still do what is
> > > > > written there?  IIRC the testcases I added all have constant
> > > > > start values, so the interesting case is with unknown start value.
> > > >
> > > > That's true, I can see in get_initial_defs_for_reduction for constant
> > > > start values we just splat, which is what I expected.
> > > >
> > > > But in vect_transform_cycle_phi with only one initial value we also
> > > > Just seem to splat and defer the checking of the initial value until
> > > > after the reduction,
> > > >
> > > >           /* Try to simplify the vector initialization by applying an
> > > >              adjustment after the reduction has been performed.  This
> > > >              can also break a critical path but on the other hand
> > > >              requires to keep the initial value live across the loop.  
> > > > */
> > > >           if (neutral_op
> > > >               && initial_values.length () == 1
> > > >               && !VECT_REDUC_INFO_REUSED_ACCUMULATOR (reduc_info)
> > > >               && STMT_VINFO_DEF_TYPE (stmt_info) == vect_reduction_def
> > > >               && !operand_equal_p (neutral_op, initial_values[0]))
> > > >
> > > > So the interesting case is where we have an SLP tree with two unknown
> > > > non-constant values. Though it's unclear to me if we can defer the check
> > > > for a single value, why not do it for all, surely the scalar final 
> > > > reductions
> > > > are still cheaper as scalar then element wise insertion in the prologue?
> > >
> > > The idea here is of course to avoid keeping the initial value life
> > > across the loop.  There's also
> !VECT_REDUC_INFO_REUSED_ACCUMULATOR
> > > here.
> > >
> >
> > True, but they're of course scalar elements so you're less likely to have 
> > much
> > register pressure issues.  The issue is that the operations for
> SHL_AND_INSERT
> > can't be done for predicates.  At least not without round-tripping through
> data.
> 
> Undestood.
> 
> > > > >
> > > > > That said, when doing the bool reduction stuff my general impression
> > > > > was that all the checks done could have some refactoring to make it
> > > > > more obvious what check applies to what cases we have.  As you show
> > > > > here, it's not always obvious.
> > > > >
> > > >
> > > > So it looks like the check isn't needed for a single non-constant value.
> > > > I wonder.. is VECT_REDUC_INFO_INITIAL_VALUES set during
> > > > Vectorizable_reduction.  From the code it looks like I need an 
> > > > additional
> > > > check on SLP_TREE_LANES (slp_node) != 1?
> > >
> > > Maybe simply avoiding the initial value insertion when we cannot
> > > insert it ... I don't think we put this into costing in any way
> > > (we also do not compute whether we re-use an accumulator for the
> > > epilog, we just set things up in a way that it could be, but all
> > > during transform ...).
> >
> > Yeah though this relaxation came 6 years after this code was added to check
> > ror shift and insert :)
> >
> > >
> > > The initial_values.legnth () == 1 check is for SLP reductions where
> > > we cannot perform the adjustment in the epilogue because we only
> > > track the initial value of one reduction, not multiple ones as we'd
> > > need for SLP reductions (and the "live" issue would become worse there).
> > >
> > > So, you'd need to check a SLP bool reduction (two bool accumulators).
> >
> > So I tried
> >
> > https://godbolt.org/z/zPEoaWPEv
> >
> > But it looks like for the single initial value case the code you added in
> > a54aa75ab30eb1a176ceaded507113252df24878 for PR 115438 doesn't
> > trigger.
> 
> Yeah, that code added back what was there for the non-SLP case (I
> hoped this epilogue adjustment thing was not important...)
> 
> > So sadly SLP_TREE_LANES (slp_node) != 1 doesn't work...
> 
> It would be (SLP_TREE_LANES (slp_node) != 1 && !reduc_chain) anyway,
> but yes.
> 
> > So for now, I'm just testing
> >
> > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> > index 6c242024971..0aa970ab338 100644
> > --- a/gcc/tree-vect-loop.cc
> > +++ b/gcc/tree-vect-loop.cc
> > @@ -7565,6 +7565,7 @@ vectorizable_reduction (loop_vec_info
> loop_vinfo,
> >       values into the low-numbered elements.  */
> >    if ((double_reduc || neutral_op)
> >        && !nunits_out.is_constant ()
> > +      && !really_constant_p (vect_phi_initial_value (reduc_def_phi))
> >        && !direct_internal_fn_supported_p (IFN_VEC_SHL_INSERT,
> >                                           vectype_out, OPTIMIZE_FOR_SPEED))
> >      {
> >
> > Instead which is the case we know works, and it looks like we'll have to 
> > give
> > boolean reduction handling some more TLC next year.
> 
> I wonder if we can simply set a "must-use-reduc_epilouge_adjustment"
> flag if we get a not supported case here?  If there's any circumstance
> we cannot use a reduc_epilogue_adjustment we'd have to bail of course.

We don't have an analysis phase for vect_transform_cycle_phi though since it's
a transform only method.  It seems to be that vectorizable_reduction is more
restrictive than the transform phases that it triggers 
(vect_transform_cycle_phi,
vect_transform_reduction).

So I'm not sure where else we'd reject the must-use-reduc_epilogue though. So I
think it likely has to be one way?

> From what I can see that condition should indeed be
> (SLP_TREE_LANES (slp_node) != 1 && !reduc_chain).  The rest is
> really "heuristics" (IIRC even the
> VECT_REDUC_INFO_REUSED_ACCUMULATOR
> guard).  Another simple check to add above is the
> !operand_equal_p (neutral_op, initial_values[0]) check.
> 

Hmm, fair, that can replace the constant check, let me try that instead.

Thanks,
Tamar


> Richard.
> 
> > Cheers,
> > Tamar
> >
> > >
> > > Richard.
> > >
> > >
> > > > I'll write some examples and see.
> > > >
> > > > Thanks,
> > > > Tamar
> > > >
> > > > > Richard.
> > > > >
> > > > > > Thanks,
> > > > > > Tamar
> > > > > >
> > > > > > gcc/ChangeLog:
> > > > > >
> > > > > >     * tree-vect-loop.cc (vectorizable_reduction): Don't require
> > > > > >     IFN_VEC_SHL_INSERT when using reduc sbool optabs.
> > > > > >
> > > > > > ---
> > > > > > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> > > > > > index
> > > > >
> > >
> 6c2420249718237c4f70720b2bd03d4951bd8a5d..ca9ff35c0bb8d8c4a161
> > > > > a922bae9c19028492b66 100644
> > > > > > --- a/gcc/tree-vect-loop.cc
> > > > > > +++ b/gcc/tree-vect-loop.cc
> > > > > > @@ -7565,6 +7565,7 @@ vectorizable_reduction (loop_vec_info
> > > > > loop_vinfo,
> > > > > >       values into the low-numbered elements.  */
> > > > > >    if ((double_reduc || neutral_op)
> > > > > >        && !nunits_out.is_constant ()
> > > > > > +      && (reduc_fn == IFN_LAST || !VECTOR_BOOLEAN_TYPE_P
> > > > > (vectype_out))
> > > > > >        && !direct_internal_fn_supported_p (IFN_VEC_SHL_INSERT,
> > > > > >                                       vectype_out,
> > > > > OPTIMIZE_FOR_SPEED))
> > > > > >      {
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > > > --
> > > > > Richard Biener <[email protected]>
> > > > > SUSE Software Solutions Germany GmbH,
> > > > > Frankenstrasse 146, 90461 Nuernberg, Germany;
> > > > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> > > > > Nuernberg)
> > > >
> > >
> > > --
> > > Richard Biener <[email protected]>
> > > SUSE Software Solutions Germany GmbH,
> > > Frankenstrasse 146, 90461 Nuernberg, Germany;
> > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> > > Nuernberg)
> >
> 
> --
> Richard Biener <[email protected]>
> SUSE Software Solutions Germany GmbH,
> Frankenstrasse 146, 90461 Nuernberg, Germany;
> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> Nuernberg)

Reply via email to