length for BB SLP

Christopher Bazley Wed, 26 Nov 2025 06:11:06 -0800


On 24/11/2025 13:05, Richard Biener wrote:

On Fri, 14 Nov 2025, Christopher Bazley wrote:

On 12/11/2025 14:27, Richard Biener wrote:

On Tue, 11 Nov 2025, Christopher Bazley wrote:

On 05/11/2025 12:08, Richard Biener wrote:

On Tue, 4 Nov 2025, Christopher Bazley wrote:

On 04/11/2025 13:57, Christopher Bazley wrote:

On 28/10/2025 13:29, Richard Biener wrote:

Isn't SLP_TREE_CAN_USE_PARTIAL_VECTORS_P redundant given
SLP_TREE_CAN_USE_MASK_P || SLP_TREE_CAN_USE_LEN_P should be exactly
this?

SLP_TREE_CAN_USE_PARTIAL_VECTORS_P might be sth for an SLP instance
(or a subgraph with multiple entries (instances)) if we want to have
consistent len vs. mask use?  (but I see no particular reason to force
consistency)

SLP_TREE_CAN_USE_PARTIAL_VECTORS_P is initialised to true and may
subsequently be set to false via vect_cannot_use_partial_vectors.

vect_analyze_stmt uses the value of SLP_TREE_CAN_USE_PARTIAL_VECTORS_P
to
decided whether to return a failure result in cases where
tail-predication
is required. If SLP_TREE_CAN_USE_MASK_P || SLP_TREE_CAN_USE_LEN_P were
used
for that purpose instead, it would follow that neither
SLP_TREE_CAN_USE_MASK_P nor SLP_TREE_CAN_USE_LEN_P could be set to true
in
cases where vect_cannot_use_partial_vectors might subsequently be called
(which seems impossible because vect_load_lanes_supported can be called
with
different values of 'count', and we cannot predict those values) , or
else
that vect_cannot_use_partial_vectors would have to set both
SLP_TREE_CAN_USE_MASK_P and SLP_TREE_CAN_USE_LEN_P to false.

Setting both flags to false in vect_cannot_use_partial_vectors would be
trivial, but SLP_TREE_CAN_USE_PARTIAL_VECTORS_P has another purpose: it
gives the return value of vect_can_use_partial_vectors_p. How did you
envisage the return value of vect_can_use_partial_vectors_p being
decided
for BB SLP? If it always returns true then I think that the vectoriser
might
carry on trying to use partial vectors when it should have already given
up;
if it returns SLP_TREE_CAN_USE_MASK_P || SLP_TREE_CAN_USE_LEN_P then
that
would prevent check_load_store_for_partial_vectors from being called for
the
first time.

Together, the three flags allow the following states to be represented:

1. Might be able to operate on partial vectors, but don't yet know
whether
we would use len or mask.

2. Might be able to operate on partial vectors with len.

3. Might be able to operate on partial vectors with mask.

4. (Invalid) Might be able to operate on partial vectors with both len
and
mask.

5. Cannot operate on partial vectors.

6. (Strictly redundant) Cannot operate on partial vectors although we
previously thought we might be able to use len.

7. (Strictly redundant) Cannot operate on partial vectors although we
previously thought we might be able to use mask.

8. (Invalid) Cannot operate on partial vectors although we previously
thought we might be able to use both len and mask.

It would be impossible to encode the four states that neither invalid
nor
redundant in only two bits. In any case, my goal was to keep the new
logic
for BB SLP as close as possible to the existing logic for loop
vectorisation.

Sorry, it would clearly be possible to encode four values in 2 bits (00,
01,
10, 11) by encoding the "Might be able to operate on partial vectors"
state
as
both bits clear and "Cannot operate on partial vectors" state as "Might
be
able to operate on partial vectors with both len and mask" (both bits
set).
A
reversal of the current encoding would probably make more sense: set both
'len' and 'mask' bits at the start, clear the 'mask' bit when 'len' is
chosen,
clear the 'len' bit when 'mask' is chosen, and clear both bits when
neither
is
valid. Maybe this encoding could also be applied to the loop vectoriser.

Do you want me to make that change?

So I think I'm somewhat confused to the extent that with BB
vectorization we do not really have a choice - if partial vectors
are necessary (because there's padding), then if we cannot use
partial vectors, the stmt analysis should fail.  I had the impression

That's what this code in does vect_analyze_stmt:

    if (bb_vinfo)
      {
        unsigned int group_size = SLP_TREE_LANES (node);
        tree vectype = SLP_TREE_VECTYPE (node);
        poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype);
        bool needs_partial = known_lt (group_size, nunits);
        if (needs_partial && !SLP_TREE_CAN_USE_PARTIAL_VECTORS_P (node))
      return opt_result::failure_at (stmt_info->stmt,
                         "not vectorized: SLP node needs but "
                         "cannot use partial vectors: %G",
                         stmt_info->stmt);
        if (maybe_gt (group_size, nunits))
      gcc_assert (multiple_p (group_size, nunits));
      }

Conservative enough I guess, some operations might not need masking,
so it can be improved.

The current mechanism works regardless of whether operations such as 'add'
need a mask or not.

The SLP_TREE_CAN_USE_PARTIAL_VECTORS_P flag just records whether the target
could support the operation if the vector were partial; it does not say
anything about whether the vector is partial, or how the target would support
the operation on a partial vector. Some kinds of node require mask/length for
partial vectors; others don't. SLP_TREE_CAN_USE_PARTIAL_VECTORS_P returns true
for nodes in the second category.

For loop masking we have LOOP_VINFO_MUST_USE_PARTIAL_VECTORS_P set
by operations that cannot be implemented correctly when not using masking.

The must_use_partial_vectors_p flag is initialized to false rather thantrue by the constructor of _loop_vec_info. It may subsequently be set totrue by vect_analyze_stmt -> vectorizable_(store|load) ->get_load_store_type (including dependent onLOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P among other conditions). None ofthe conditions that cause must_use_partial_vectors_p to be set to trueseem to apply to BB SLP (e.g., BB SLP does not have overrun_p, earlybreaks, or dr_safe_speculative_read_required).

I see most of the BB "partial" vector operations like those, and if
we fail to support masking for such an operation it's vectorizable_*
analysis should fail.   Likewise for operations that don't mind there's
not really anything to record.

SLP_TREE_CAN_USE_PARTIAL_VECTORS_P is initialised to true by theconstructor of _bb_vec_info and may be set to false by vectorizable_*under the same circumstances that LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_Pis set to false (target does not support length/mask).

If there were an equivalent flag for BB SLP vectorization, then it wouldpresumably likewise be set by get_load_store_type (ignoring the factthat we might want to apply masks to arithmetic operations in future),but perhaps a new flag would not be needed if get_load_store_typereturned false.

I get that your way to abstract BB vs. loop operation does not support
failing vectorizable_* this way since you do not teach vectorizable_*
whether needs_partial but you initialize
SLP_TREE_CAN_USE_PARTIAL_VECTORS_P to true in this case?

Unfortunately, I do not think that "...teaching vectorizable_* whetherneeds_partial..." would be sufficient because needs_partial does notmean much unless SLP_TREE_CAN_USE_PARTIAL_VECTORS_P is false. I think Iwould also have to teach vectorizable_* to determine whether partialvectors are supported for each operation type. That code already existsfor loop vectorisation, therefore I chose to leverage it.

Note that get_load_store_type is called earlier thancheck_load_store_for_partial_vectors, therefore it lacks some of thecontext to decide whether to return false or not. The vectoriser hasn'tyet examined the target's capabilities to support partial vectors at thepoint where get_load_store_type is called.

I can see the attraction of what you are proposing, but I don't think itfits as neatly into the existing code as my current solution and I wouldbe worried about making mistakes or omissions when trying to make thealterations you are asking for. I already did an extra round ofdebugging to switch to use of an enum for vect_record_len andvect_record_mask.

The vect_record_loop_{len,mask} abstraction should still record
what style we used for the particular SLP node.

Sorry for taking so long to respond, but the discussion seems to go
over too long time and disconnected from actual patches.  Is the [v2]
patchset still current?


No, it's not; I sent v3 on Monday but I see that you already replied to one of 
those emails.

Thanks,
--
Christopher Bazley
Staff Software Engineer, GNU Tools Team.
Arm Ltd, 110 Fulbourn Road, Cambridge, CB1 9NJ, UK.
http://www.arm.com/

Re: [RFC 3/9] Implement recording/getting of mask/length for BB SLP

Reply via email to