On Tue, 11 Nov 2025, Christopher Bazley wrote:
>
> On 05/11/2025 12:08, Richard Biener wrote:
> > On Tue, 4 Nov 2025, Christopher Bazley wrote:
> >
> >> On 04/11/2025 13:57, Christopher Bazley wrote:
> >>> On 28/10/2025 13:29, Richard Biener wrote:
> >>>> Isn't SLP_TREE_CAN_USE_PARTIAL_VECTORS_P redundant given
> >>>> SLP_TREE_CAN_USE_MASK_P || SLP_TREE_CAN_USE_LEN_P should be exactly this?
> >>>>
> >>>> SLP_TREE_CAN_USE_PARTIAL_VECTORS_P might be sth for an SLP instance
> >>>> (or a subgraph with multiple entries (instances)) if we want to have
> >>>> consistent len vs. mask use? (but I see no particular reason to force
> >>>> consistency)
> >>> SLP_TREE_CAN_USE_PARTIAL_VECTORS_P is initialised to true and may
> >>> subsequently be set to false via vect_cannot_use_partial_vectors.
> >>>
> >>> vect_analyze_stmt uses the value of SLP_TREE_CAN_USE_PARTIAL_VECTORS_P to
> >>> decided whether to return a failure result in cases where tail-predication
> >>> is required. If SLP_TREE_CAN_USE_MASK_P || SLP_TREE_CAN_USE_LEN_P were
> >>> used
> >>> for that purpose instead, it would follow that neither
> >>> SLP_TREE_CAN_USE_MASK_P nor SLP_TREE_CAN_USE_LEN_P could be set to true in
> >>> cases where vect_cannot_use_partial_vectors might subsequently be called
> >>> (which seems impossible because vect_load_lanes_supported can be called
> >>> with
> >>> different values of 'count', and we cannot predict those values) , or else
> >>> that vect_cannot_use_partial_vectors would have to set both
> >>> SLP_TREE_CAN_USE_MASK_P and SLP_TREE_CAN_USE_LEN_P to false.
> >>>
> >>> Setting both flags to false in vect_cannot_use_partial_vectors would be
> >>> trivial, but SLP_TREE_CAN_USE_PARTIAL_VECTORS_P has another purpose: it
> >>> gives the return value of vect_can_use_partial_vectors_p. How did you
> >>> envisage the return value of vect_can_use_partial_vectors_p being decided
> >>> for BB SLP? If it always returns true then I think that the vectoriser
> >>> might
> >>> carry on trying to use partial vectors when it should have already given
> >>> up;
> >>> if it returns SLP_TREE_CAN_USE_MASK_P || SLP_TREE_CAN_USE_LEN_P then that
> >>> would prevent check_load_store_for_partial_vectors from being called for
> >>> the
> >>> first time.
> >>>
> >>> Together, the three flags allow the following states to be represented:
> >>>
> >>> 1. Might be able to operate on partial vectors, but don't yet know whether
> >>> we would use len or mask.
> >>>
> >>> 2. Might be able to operate on partial vectors with len.
> >>>
> >>> 3. Might be able to operate on partial vectors with mask.
> >>>
> >>> 4. (Invalid) Might be able to operate on partial vectors with both len and
> >>> mask.
> >>>
> >>> 5. Cannot operate on partial vectors.
> >>>
> >>> 6. (Strictly redundant) Cannot operate on partial vectors although we
> >>> previously thought we might be able to use len.
> >>>
> >>> 7. (Strictly redundant) Cannot operate on partial vectors although we
> >>> previously thought we might be able to use mask.
> >>>
> >>> 8. (Invalid) Cannot operate on partial vectors although we previously
> >>> thought we might be able to use both len and mask.
> >>>
> >>> It would be impossible to encode the four states that neither invalid nor
> >>> redundant in only two bits. In any case, my goal was to keep the new logic
> >>> for BB SLP as close as possible to the existing logic for loop
> >>> vectorisation.
> >> Sorry, it would clearly be possible to encode four values in 2 bits (00,
> >> 01,
> >> 10, 11) by encoding the "Might be able to operate on partial vectors" state
> >> as
> >> both bits clear and "Cannot operate on partial vectors" state as "Might be
> >> able to operate on partial vectors with both len and mask" (both bits set).
> >> A
> >> reversal of the current encoding would probably make more sense: set both
> >> 'len' and 'mask' bits at the start, clear the 'mask' bit when 'len' is
> >> chosen,
> >> clear the 'len' bit when 'mask' is chosen, and clear both bits when neither
> >> is
> >> valid. Maybe this encoding could also be applied to the loop vectoriser.
> >>
> >> Do you want me to make that change?
> > So I think I'm somewhat confused to the extent that with BB
> > vectorization we do not really have a choice - if partial vectors
> > are necessary (because there's padding), then if we cannot use
> > partial vectors, the stmt analysis should fail. I had the impression
>
> That's what this code in does vect_analyze_stmt:
>
> if (bb_vinfo)
> {
> unsigned int group_size = SLP_TREE_LANES (node);
> tree vectype = SLP_TREE_VECTYPE (node);
> poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype);
> bool needs_partial = known_lt (group_size, nunits);
> if (needs_partial && !SLP_TREE_CAN_USE_PARTIAL_VECTORS_P (node))
> return opt_result::failure_at (stmt_info->stmt,
> "not vectorized: SLP node needs but "
> "cannot use partial vectors: %G",
> stmt_info->stmt);
> if (maybe_gt (group_size, nunits))
> gcc_assert (multiple_p (group_size, nunits));
> }
Conservative enough I guess, some operations might not need masking,
so it can be improved.
> > that vectorizable_* computes the per-SLP node SLP_TREE_CAN_USE_*.
>
> That's right.
>
> > Why would it even say that it can use partial vectors when it doesn't
> > need to for example?
>
> It won't through SLP_TREE_CAN_USE_MASK_P or SLP_TREE_CAN_USE_LEN_P; however,
> SLP_TREE_CAN_USE_PARTIAL_VECTORS_P is initialised to true for every SLP node
> and only set to false if the vectoriser wanted to use partial vectors but
> cannot. I don't personally find that confusing, because 'can' doesn't imply
> 'will' to me.
>
> > Consider a target that can do a mask load and
> > a mask store but not masked operations and
> >
> > a[0] = 5 + b[0];
> > a[1] = 5 + b[1];
> > a[2] = 5 + b[2];
> >
> > on unsigned int, we'd chose V4SImode and both load and store can do
> > SLP_TREE_CAN_USE_MASK_P. The add cannot, but it doesn't need to
> > mask the padding, so !SLP_TREE_CAN_USE_MASK_P and all stmt analyses
> > should succeed.
>
> That's right. Admittedly, SLP_TREE_CAN_USE_MASK_P does not seem like a good
> predicate name anymore.
I think SLP_TREE_NEEDS_PARTIAL_VECTORS_P might be better (covering
either mask or len). But then we can decide in the first place
because we should fail. The only reason to delay might be if we can
use both (but as you said, we'd never actually check for both!).
> > If the target cannot do a mask load then vectorization
> > with V4SImode should fail vectorizable_load (), no need to set
> > slp-tree-would-need-partial-vectors-but-cannot.
> That would require special-casing of BB-SLP in existing vectorizable_*
> functions. Changing the control flow of those functions or adding complexity
> is something that I explicitly wanted to avoid.
Most of the functions that only need masking in some cases already know
to do this for loop vectorization, so I don't think this would require
any adjustments?
> > So for BB vect isn't it SLP_TREE_MUST_USE_PARTIAL_VECTORS_P
> > and then either or both of SLP_TREE_CAN_USE_MASK_P and
> > SLP_TREE_CAN_USE_LEN_P?
>
> SLP_TREE_MUST_USE_PARTIAL_VECTORS_P does not exist. I added assertions to
> check that SLP_TREE_CAN_USE_MASK_P or SLP_TREE_CAN_USE_LEN_P cannot be set to
> true if the other predicate is already true. e.g., in vect_record_mask:
>
> gcc_checking_assert (!SLP_TREE_CAN_USE_LEN_P (slp_node));
> SLP_TREE_CAN_USE_MASK_P (slp_node) = true;
>
> It would be more accurate to say that either SLP_TREE_CAN_USE_MASK_P is true,
> SLP_TREE_CAN_USE_LEN_P is true, or SLP_TREE_CAN_USE_PARTIAL_VECTORS_P is
> false.
>
> > And we do not really need any upthread
> > decision, thus analysis would only need to decide and remember the actual
> > partial vector method used? If that's correct I'd rather see
> > a SLP_TREE_PARTIAL_VECTORS_STYLE-like enum here?
>
> OK, I'll see if I can implement one.
>
> > With loop vect we have the global decision, so we need to record
> > what we can use and what not, but with BB vect we can decide
> > immediately given there's no "fallback".
> >
> > Richard.
>
> Understood.
>
> Thanks,
>
>
--
Richard Biener <[email protected]>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Jochen Jaser, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)