On Tue, Jun 9, 2026 at 11:24 AM Richard Biener
<[email protected]> wrote:
>
> On Wed, Jun 3, 2026 at 5:20 PM Christopher Bazley <[email protected]> 
> wrote:
> >
> > Add two new fields to SLP tree nodes, which are accessed as
> > SLP_TREE_CAN_USE_PARTIAL_VECTORS_P and SLP_TREE_PARTIAL_VECTORS_STYLE.
> >
> > SLP_TREE_CAN_USE_PARTIAL_VECTORS_P is analogous to the existing
> > predicate LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P. It is initialized to
> > true. This flag just records whether the target could vectorize a
> > node using a partial vector; it does not say anything about
> > whether the vector actually is partial, or how the target would support
> > use of a partial vector. Some kinds of node require mask/length for
> > partial vectors; others don't. In the latter case (e.g., for add
> > operations), SLP_TREE_CAN_USE_PARTIAL_VECTORS_P will remain true.
> >
> > SLP_TREE_PARTIAL_VECTORS_STYLE is analogous to the existing field
> > LOOP_VINFO_PARTIAL_VECTORS_STYLE. Both are initialized to 'none'.
> > The vect_partial_vectors_avx512 enumerator is not used for BB SLP.
> > Unlike loop vectorization, a different style of partial vectors can be
> > chosen for each node during analysis of that node.

To add, SLP_TREE_PARTIAL_VECTORS_STYLE should not be per SLP node but
per 'vinfo'.  It doesn't make much sense to have differing styles
active.  So please
just move the partial vectors style from loop_vinfo to the vinfo base class.

The overall commit message does not mention SLP_TREE_NUM_PARTIAL_VECTORS
you add per node and why.  It definitely shouldn't live there, it's an
odd counter
that's going to be 0 or 1 exactly when we decide to (possibly) use
partial vectors?
As said, I think we want to re-use the loop mask/len tracking here.

> >
> > Implement the recently-introduced wrapper functions,
> > vect_record_(len|mask), for BB SLP by setting
> > SLP_TREE_PARTIAL_VECTORS_STYLE to indicate that a mask or length should
> > be used for a given SLP node. The passed-in vec_info is ignored.
> >
> > Implement the vect_fully_(masked|with_length)_p wrapper functions for
> > BB SLP by checking the SLP_TREE_PARTIAL_VECTORS_STYLE. This should be
> > sufficient because at most one of vect_record_(len|mask) and
> > vect_cannot_use_partial_vectors are expected to be called for any
> > given SLP node. SLP_TREE_CAN_USE_PARTIAL_VECTORS_P should be true if
> > the style is not 'none', but its value isn't used beyond the analysis
> > phase.
> >
> > The implementations of vect_get_mask and vect_get_len for BB SLP are
> > non-trivial (albeit simpler than for loop vectorization), therefore they
> > are delegated to SLP-specific functions defined in tree-vect-slp.cc.
> >
> > Implement the vect_cannot_use_partial_vectors wrapper function by
> > setting the SLP_TREE_CAN_USE_PARTIAL_VECTORS_P flag to false.
> > To prevent regressions, vect_can_use_partial_vectors_p still returns
> > false for BB SLP regardless (for now). This prevents vect_record_mask
> > or vect_record_len from being called.
> >
> > gcc/ChangeLog:
> >
> >         * tree-vect-slp.cc (_slp_tree::_slp_tree): initialize new
> >         partial_vector_style, can_use_partial_vectors and
> >         num_partial_vectors members.
> >         (vect_slp_analyze_node_operations): Account for worst-case
> >         prologue costs of per-node partial-vector mask or length
> >         materialisation.
> >         (vect_slp_record_bb_style): Set the partial vector style of an
> >         SLP node, checking that the style does not flip-flop between mask
> >         and length.
> >         (vect_slp_record_bb_mask): Use vect_slp_record_bb_style to set
> >         the partial vector style of the SLP tree node to
> >         vect_partial_vectors_while_ult.
> >         (vect_slp_get_bb_mask): New function to materialize a mask for
> >         basic block SLP vectorization.
> >         (vect_slp_record_bb_len): Use vect_slp_record_bb_style to set
> >         the partial vector style of the SLP tree node to
> >         vect_partial_vectors_len.
> >         (vect_slp_get_bb_len): New function to materialize a length for
> >         basic block SLP vectorization.
> >         * tree-vect-stmts.cc (vectorizable_internal_function):
> >         (vect_record_mask): Handle the basic block SLP use case by
> >         delegating to vect_slp_record_bb_mask.
> >         (vect_get_mask): Handle the basic block SLP use case by
> >         delegating to vect_slp_get_bb_mask.
> >         (vect_record_len): Handle the basic block SLP use case by
> >         delegating to vect_slp_record_bb_len.
> >         (vect_get_len): Handle the basic block SLP use case by
> >         delegating to vect_slp_get_bb_len.
> >         (vect_gen_while_ssa_name): New function containing code
> >         refactored out of vect_gen_while for reuse by
> >         vect_slp_get_bb_mask.
> >         (vect_gen_while): Use vect_gen_while_ssa_name instead of custom
> >         code for some of the implementation.
> >         * tree-vectorizer.h (enum vect_partial_vector_style): Move this
> >         definition earlier to allow reuse by struct _slp_tree.
> >         (struct _slp_tree): Add a partial_vector_style member to record
> >         whether to use a length or mask for the SLP tree node, if
> >         partial vectors are required and supported.
> >         Add a can_use_partial_vectors member to record whether partial
> >         vectors are supported for the SLP tree node.
> >         Add a num_partial_vectors member for costing.
> >         (SLP_TREE_PARTIAL_VECTORS_STYLE): New member accessor macro.
> >         (SLP_TREE_CAN_USE_PARTIAL_VECTORS_P): New member accessor macro.
> >         (SLP_TREE_NUM_PARTIAL_VECTORS): New member accessor macro.
> >         (vect_gen_while_ssa_name): Declaration of a new function.
> >         (vect_slp_get_bb_mask): As above.
> >         (vect_slp_get_bb_len): As above.
> >         (vect_cannot_use_partial_vectors): Handle the basic block SLP
> >         use-case by setting SLP_TREE_CAN_USE_PARTIAL_VECTORS_P to
> >         false.
> >         (vect_fully_with_length_p): Handle the basic block SLP use
> >         case by checking whether the SLP_TREE_PARTIAL_VECTORS_STYLE is
> >         vect_partial_vectors_len.
> >         (vect_fully_masked_p): Handle the basic block SLP use case by
> >         checking whether the SLP_TREE_PARTIAL_VECTORS_STYLE is
> >         vect_partial_vectors_while_ult.
> > ---
> >  gcc/tree-vect-slp.cc   | 182 +++++++++++++++++++++++++++++++++++++++++
> >  gcc/tree-vect-stmts.cc |  52 +++++++-----
> >  gcc/tree-vectorizer.h  |  52 ++++++++----
> >  3 files changed, 247 insertions(+), 39 deletions(-)
> >
> > diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> > index 075e93f04a9..4dd7e6e1e21 100644
> > --- a/gcc/tree-vect-slp.cc
> > +++ b/gcc/tree-vect-slp.cc
> > @@ -125,6 +125,9 @@ _slp_tree::_slp_tree ()
> >    SLP_TREE_GS_BASE (this) = NULL_TREE;
> >    this->ldst_lanes = false;
> >    this->avoid_stlf_fail = false;
> > +  SLP_TREE_PARTIAL_VECTORS_STYLE (this) = vect_partial_vectors_none;
> > +  SLP_TREE_CAN_USE_PARTIAL_VECTORS_P (this) = true;
> > +  SLP_TREE_NUM_PARTIAL_VECTORS (this) = 0;
> >    SLP_TREE_VECTYPE (this) = NULL_TREE;
> >    SLP_TREE_REPRESENTATIVE (this) = NULL;
> >    this->cycle_info.id = -1;
> > @@ -8958,6 +8961,40 @@ vect_slp_analyze_node_operations (vec_info *vinfo, 
> > slp_tree node,
> >           vect_prologue_cost_for_slp (vinfo, child, cost_vec);
> >         }
> >
> > +  if (res)
> > +    {
> > +      /* Take care of special costs for partial vectors.
> > +        Costing each partial vector is excessive for many SLP instances,
> > +        because it is common to materialise identical masks/lengths for 
> > related
> > +        operations (e.g., for vector loads and stores of the same length).
> > +        Masks/lengths can also be shared between SLP subgraphs or 
> > eliminated by
> > +        pattern-based lowering during instruction selection.  However, it's
> > +        simpler and safer to use the worst-case cost; if this ends up 
> > being the
> > +        tie-breaker between vectorizing or not, then it's probably better 
> > not
> > +        to vectorize.  */
>
> I'd prefer to do this per SLP subgraph group based on recorded
> requirements so similar
> how loop masking is set up.
>
> > +      const int num_partial_vectors = SLP_TREE_NUM_PARTIAL_VECTORS (node);
> > +
> > +      if (SLP_TREE_PARTIAL_VECTORS_STYLE (node)
> > +         == vect_partial_vectors_while_ult)
> > +       {
> > +         gcc_assert (num_partial_vectors > 0);
> > +         record_stmt_cost (cost_vec, num_partial_vectors, vector_stmt, 
> > NULL,
> > +                           NULL, NULL_TREE, 0, vect_prologue);
> > +       }
> > +      else if (SLP_TREE_PARTIAL_VECTORS_STYLE (node)
> > +              == vect_partial_vectors_len)
> > +       {
> > +         /* Need to set up a length in the prologue.  */
> > +         gcc_assert (num_partial_vectors > 0);
> > +         record_stmt_cost (cost_vec, num_partial_vectors, scalar_stmt, 
> > NULL,
> > +                           NULL, NULL_TREE, 0, vect_prologue);
> > +       }
> > +      else
> > +       {
> > +         gcc_assert (num_partial_vectors == 0);
> > +       }
> > +    }
> > +
> >    /* If this node or any of its children can't be vectorized, try pruning
> >       the tree here rather than felling the whole thing.  */
> >    if (!res && vect_slp_convert_to_external (vinfo, node, node_instance))
> > @@ -12441,3 +12478,148 @@ vect_schedule_slp (vec_info *vinfo, const 
> > vec<slp_instance> &slp_instances)
> >          }
> >      }
> >  }
> > +
> > +/* Record that a specific partial vector style could be used to vectorize
> > +   SLP_NODE if required.  */
> > +
> > +static void
> > +vect_slp_record_bb_style (slp_tree slp_node, vect_partial_vector_style 
> > style)
> > +{
> > +  gcc_assert (style != vect_partial_vectors_none);
> > +  gcc_assert (style != vect_partial_vectors_avx512);
> > +
> > +  if (SLP_TREE_PARTIAL_VECTORS_STYLE (slp_node) == 
> > vect_partial_vectors_none)
> > +    SLP_TREE_PARTIAL_VECTORS_STYLE (slp_node) = style;
> > +  else
> > +    gcc_assert (SLP_TREE_PARTIAL_VECTORS_STYLE (slp_node) == style);
> > +}
> > +
> > +/* Record that a complete set of masks associated with SLP_NODE would need 
> > to
> > +   contain a sequence of NVECTORS masks that each control a vector of type
> > +   VECTYPE.  If SCALAR_MASK is nonnull, the fully-masked loop would AND
> > +   these vector masks with the vector version of SCALAR_MASK.  */
> > +void
> > +vect_slp_record_bb_mask (slp_tree slp_node, unsigned int /* nvectors */,
> > +                        tree /* vectype */, tree /* scalar_mask */)
> > +{
> > +  vect_slp_record_bb_style (slp_node, vect_partial_vectors_while_ult);
> > +
> > +  /* FORNOW: this often overestimates the number of masks for costing 
> > purposes
> > +     because, after lowering, masks have often been eliminated, shared 
> > between
> > +     SLP nodes, or even shared between SLP subgraphs.  */
> > +  SLP_TREE_NUM_PARTIAL_VECTORS(slp_node) ++;
> > +}
> > +
> > +/* Materialize mask number INDEX for a group of scalar stmts in SLP_NODE 
> > that
> > +   operate on NVECTORS vectors of type VECTYPE, where 0 <= INDEX < 
> > NVECTORS.
> > +   Insert any set-up statements before GSI.  */
> > +
> > +tree
> > +vect_slp_get_bb_mask (slp_tree slp_node, gimple_stmt_iterator *gsi,
> > +                     unsigned int nvectors, tree vectype, unsigned int 
> > index)
> > +{
> > +  gcc_assert (SLP_TREE_PARTIAL_VECTORS_STYLE (slp_node)
> > +             == vect_partial_vectors_while_ult);
> > +  gcc_assert (nvectors >= 1);
> > +  gcc_assert (index < nvectors);
> > +
> > +  const poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype);
> > +  const unsigned int group_size = SLP_TREE_LANES (slp_node);
> > +  unsigned int mask_size = group_size;
> > +  const tree masktype = truth_type_for (vectype);
> > +
> > +  if (nunits.is_constant ())
> > +    {
> > +      /* Only the last vector can be a partial vector.  */
> > +      if (index + 1 < nvectors)
> > +       return build_minus_one_cst (masktype);
> > +
> > +      /* Return a mask for a possibly-partial tail vector. */
> > +      const unsigned int const_nunits = nunits.to_constant ();
> > +      const unsigned int head_size = (nvectors - 1) * const_nunits;
> > +      gcc_assert (head_size <= group_size);
> > +      mask_size = group_size - head_size;
> > +
> > +      if (mask_size == const_nunits)
> > +       return build_minus_one_cst (masktype);
> > +    }
> > +  else
> > +    {
> > +      /* Return a mask for a single variable-length vector. */
> > +      gcc_assert (nvectors == 1);
> > +      gcc_assert (known_le (mask_size, nunits));
> > +    }
> > +
> > +  /* FORNOW: don't bother maintaining a set of mask constants to allow
> > +     sharing between nodes belonging to the same instance of bb_vec_info
> > +     or even within the same SLP subgraph.  */
>
> See above.  The loop code already should have everything set up for
> caching.  Why not reuse that?
>
> > +  gimple_seq stmts = NULL;
> > +  const tree cmp_type = size_type_node;
> > +  const tree start_index = build_zero_cst (cmp_type);
> > +  const tree end_index = build_int_cst (cmp_type, mask_size);
> > +  const tree mask = make_temp_ssa_name (masktype, NULL, "slp_mask");
> > +  vect_gen_while_ssa_name (&stmts, masktype, start_index, end_index, mask);
> > +  gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);
> > +  return mask;
> > +}
> > +
> > +/* Record that a complete set of lengths associated with SLP_NODE would 
> > need to
> > +   contain a sequence of NVECTORS lengths for controlling an operation on
> > +   VECTYPE.  The operation splits each element of VECTYPE into FACTOR 
> > separate
> > +   subelements, measuring the length as a number of these subelements.  */
> > +
> > +void
> > +vect_slp_record_bb_len (slp_tree slp_node, unsigned int /* nvectors */,
> > +                       tree /* vectype */, unsigned int /* factor */)
> > +{
> > +  vect_slp_record_bb_style (slp_node, vect_partial_vectors_len);
> > +
> > +  /* FORNOW: this probably overestimates the number of lengths for costing
> > +     purposes because, after lowering, lengths might have been eliminated,
> > +     shared between SLP nodes, or even shared between SLP subgraphs.  */
> > +  SLP_TREE_NUM_PARTIAL_VECTORS (slp_node)++;
> > +}
> > +
> > +/* Materialize length number INDEX for a group of scalar stmts in SLP_NODE 
> > that
> > +   operate on NVECTORS vectors of type VECTYPE, where 0 <= INDEX < 
> > NVECTORS.
> > +   Return a value that contains FACTOR multiplied by the number of 
> > elements that
> > +   should be processed.  */
> > +
> > +tree
> > +vect_slp_get_bb_len (slp_tree slp_node, unsigned int nvectors, tree 
> > vectype,
> > +                    unsigned int index, unsigned int factor, bool adjusted)
> > +{
> > +  gcc_checking_assert (SLP_TREE_PARTIAL_VECTORS_STYLE (slp_node)
> > +                      == vect_partial_vectors_len);
> > +  gcc_assert (nvectors >= 1);
> > +  gcc_assert (index < nvectors);
> > +  (void) adjusted;
> > +
> > +  const poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype);
> > +  const unsigned int group_size = SLP_TREE_LANES (slp_node);
> > +  unsigned int len = group_size;
> > +
> > +  if (nunits.is_constant ())
> > +    {
> > +      const unsigned int const_nunits = nunits.to_constant ();
> > +
> > +      /* Only the last vector can be a partial vector.  */
> > +      if (index + 1 < nvectors)
> > +       len = const_nunits;
> > +      else
> > +       {
> > +         /* Return a length for a possibly-partial tail vector. */
> > +         const unsigned int head_size = (nvectors - 1) * const_nunits;
> > +         gcc_assert (head_size <= group_size);
> > +         len = group_size - head_size;
> > +       }
> > +    }
> > +  else
> > +    {
> > +      /* Return a length for a single variable-length vector. */
> > +      gcc_assert (nvectors == 1);
> > +      gcc_assert (known_le (len, nunits));
> > +    }
> > +
> > +  return size_int (len * factor);
> > +}
> > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> > index 15fca17a407..ecad74e7cbf 100644
> > --- a/gcc/tree-vect-stmts.cc
> > +++ b/gcc/tree-vect-stmts.cc
> > @@ -1385,7 +1385,9 @@ vectorizable_internal_function (combined_fn cfn, tree 
> > fndecl,
> >  /* Record that a complete set of masks associated with VINFO would need to
> >     contain a sequence of NVECTORS masks that each control a vector of type
> >     VECTYPE.  If SCALAR_MASK is nonnull, the fully-masked loop would AND
> > -   these vector masks with the vector version of SCALAR_MASK.  */
> > +   these vector masks with the vector version of SCALAR_MASK.  
> > Alternatively,
> > +   if doing basic block vectorization, record that a mask could be used to
> > +   vectorize SLP_NODE if required.  */
> >  static void
> >  vect_record_mask (vec_info *vinfo, slp_tree slp_node, unsigned int 
> > nvectors,
> >                   tree vectype, tree scalar_mask)
> > @@ -1395,7 +1397,7 @@ vect_record_mask (vec_info *vinfo, slp_tree slp_node, 
> > unsigned int nvectors,
> >      vect_record_loop_mask (loop_vinfo, &LOOP_VINFO_MASKS (loop_vinfo), 
> > nvectors,
> >                            vectype, scalar_mask);
> >    else
> > -    (void) slp_node; /* FORNOW */
> > +    vect_slp_record_bb_mask (slp_node, nvectors, vectype, scalar_mask);
> >  }
> >
> >  /* Given a complete set of masks associated with VINFO, extract mask number
> > @@ -1413,16 +1415,15 @@ vect_get_mask (vec_info *vinfo, slp_tree slp_node, 
> > gimple_stmt_iterator *gsi,
> >      return vect_get_loop_mask (loop_vinfo, gsi, &LOOP_VINFO_MASKS 
> > (loop_vinfo),
> >                                nvectors, vectype, index);
> >    else
> > -    {
> > -      (void) slp_node; /* FORNOW */
> > -      return NULL_TREE;
> > -    }
> > +    return vect_slp_get_bb_mask (slp_node, gsi, nvectors, vectype, index);
> >  }
> >
> >  /* Record that a complete set of lengths associated with VINFO would need 
> > to
> >     contain a sequence of NVECTORS lengths for controlling an operation on
> >     VECTYPE.  The operation splits each element of VECTYPE into FACTOR 
> > separate
> > -   subelements, measuring the length as a number of these subelements.  */
> > +   subelements, measuring the length as a number of these subelements.
> > +   Alternatively, if doing basic block vectorization, record that a length 
> > limit
> > +   could be used to vectorize SLP_NODE if required.  */
> >  static void
> >  vect_record_len (vec_info *vinfo, slp_tree slp_node, unsigned int nvectors,
> >                  tree vectype, unsigned int factor)
> > @@ -1432,7 +1433,7 @@ vect_record_len (vec_info *vinfo, slp_tree slp_node, 
> > unsigned int nvectors,
> >      vect_record_loop_len (loop_vinfo, &LOOP_VINFO_LENS (loop_vinfo), 
> > nvectors,
> >                           vectype, factor);
> >    else
> > -    (void) slp_node; /* FORNOW */
> > +    vect_slp_record_bb_len (slp_node, nvectors, vectype, factor);
> >  }
> >
> >  /* Given a complete set of lengths associated with VINFO, extract length 
> > number
> > @@ -1453,10 +1454,8 @@ vect_get_len (vec_info *vinfo, slp_tree slp_node, 
> > gimple_stmt_iterator *gsi,
> >      return vect_get_loop_len (loop_vinfo, gsi, &LOOP_VINFO_LENS 
> > (loop_vinfo),
> >                               nvectors, vectype, index, factor, adjusted);
> >    else
> > -    {
> > -      (void) slp_node; /* FORNOW */
> > -      return NULL_TREE;
> > -    }
> > +    return vect_slp_get_bb_len (slp_node, nvectors, vectype, index, factor,
> > +                               adjusted);
> >  }
> >
> >  static tree permute_vec_elements (vec_info *, tree, tree, tree, 
> > stmt_vec_info,
> > @@ -14710,24 +14709,35 @@ supportable_indirect_convert_operation 
> > (code_helper code,
> >     mask[I] is true iff J + START_INDEX < END_INDEX for all J <= I.
> >     Add the statements to SEQ.  */
> >
> > +void
> > +vect_gen_while_ssa_name (gimple_seq *seq, tree mask_type, tree start_index,
> > +                        tree end_index, tree ssa_name)
> > +{
> > +  tree cmp_type = TREE_TYPE (start_index);
> > +  gcc_checking_assert (direct_internal_fn_supported_p (IFN_WHILE_ULT, 
> > cmp_type,
> > +                                                      mask_type,
> > +                                                      OPTIMIZE_FOR_SPEED));
> > +  gcall *call
> > +    = gimple_build_call_internal (IFN_WHILE_ULT, 3, start_index, end_index,
> > +                                 build_zero_cst (mask_type));
> > +  gimple_call_set_lhs (call, ssa_name);
> > +  gimple_seq_add_stmt (seq, call);
> > +}
> > +
> > +/*  Like vect_gen_while_ssa_name except that it creates a new SSA_NAME node
> > +    for type MASK_TYPE defined in the created GIMPLE_CALL statement.  If 
> > NAME
> > +    is not a null pointer then it is used for the SSA_NAME in dumps.  */
> > +
> >  tree
> >  vect_gen_while (gimple_seq *seq, tree mask_type, tree start_index,
> >                 tree end_index, const char *name)
> >  {
> > -  tree cmp_type = TREE_TYPE (start_index);
> > -  gcc_checking_assert (direct_internal_fn_supported_p (IFN_WHILE_ULT,
> > -                                                      cmp_type, mask_type,
> > -                                                      OPTIMIZE_FOR_SPEED));
> > -  gcall *call = gimple_build_call_internal (IFN_WHILE_ULT, 3,
> > -                                           start_index, end_index,
> > -                                           build_zero_cst (mask_type));
> >    tree tmp;
> >    if (name)
> >      tmp = make_temp_ssa_name (mask_type, NULL, name);
> >    else
> >      tmp = make_ssa_name (mask_type);
> > -  gimple_call_set_lhs (call, tmp);
> > -  gimple_seq_add_stmt (seq, call);
> > +  vect_gen_while_ssa_name (seq, mask_type, start_index, end_index, tmp);
> >    return tmp;
> >  }
> >
> > diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
> > index a3855568b09..f79f04ff8ac 100644
> > --- a/gcc/tree-vectorizer.h
> > +++ b/gcc/tree-vectorizer.h
> > @@ -312,6 +312,13 @@ struct vect_load_store_data : vect_data {
> >    bool subchain_p; // VMAT_STRIDED_SLP and VMAT_GATHER_SCATTER
> >  };
> >
> > +enum vect_partial_vector_style {
> > +  vect_partial_vectors_none,
> > +  vect_partial_vectors_while_ult,
> > +  vect_partial_vectors_avx512,
> > +  vect_partial_vectors_len
> > +};
> > +
> >  /* A computation tree of an SLP instance.  Each node corresponds to a 
> > group of
> >     stmts to be packed in a SIMD stmt.  */
> >  struct _slp_tree {
> > @@ -377,7 +384,16 @@ struct _slp_tree {
> >    /* For BB vect, flag to indicate this load node should be vectorized
> >       as to avoid STLF fails because of related stores.  */
> >    bool avoid_stlf_fail;
> > -
> > +  /* The style used for implementing partial vectors if LANES is less than
> > +     the minimum number of lanes implied by the VECTYPE.  */
> > +  vect_partial_vector_style partial_vector_style;
>
> I wonder if we want to / need to mix style across the SLP subgraph, likewise
> whether we really need to track can_use_partial_vectors per SLP node as
> opposed to per subgraph.  Likewise I wonder if we want to deal with the
> case of parts of the graph being unsupported because of lack of masking
> support which we could fix by promoting that part extern (not covered) rather
> than failing the whole subgraph.
>
> That is, I'm questioning (maybe again?) the overall tracking/analysis phase?
>
> > +  /* Flag to indicate whether we still have the option of vectorizing this 
> > node
> > +     using partial vectors (i.e.  using lengths or masks to prevent use of
> > +     inactive scalar lanes).  */
> > +  bool can_use_partial_vectors;
> > +  /* Number of partial vectors, for costing purposes. Should be 0 unless a
> > +     partial vector style has been set.  */
> > +  int num_partial_vectors;
> >    int vertex;
> >
> >    /* The kind of operation as determined by analysis and optional
> > @@ -476,6 +492,9 @@ public:
> >  #define SLP_TREE_GS_BASE(S)                     (S)->gs_base
> >  #define SLP_TREE_REDUC_IDX(S)                   (S)->cycle_info.reduc_idx
> >  #define SLP_TREE_PERMUTE_P(S)                   ((S)->code == 
> > VEC_PERM_EXPR)
> > +#define SLP_TREE_PARTIAL_VECTORS_STYLE(S)       (S)->partial_vector_style
> > +#define SLP_TREE_CAN_USE_PARTIAL_VECTORS_P(S)   
> > (S)->can_use_partial_vectors
> > +#define SLP_TREE_NUM_PARTIAL_VECTORS(S)                 
> > (S)->num_partial_vectors
> >
> >  inline vect_memory_access_type
> >  SLP_TREE_MEMORY_ACCESS_TYPE (slp_tree node)
> > @@ -486,13 +505,6 @@ SLP_TREE_MEMORY_ACCESS_TYPE (slp_tree node)
> >    return VMAT_UNINITIALIZED;
> >  }
> >
> > -enum vect_partial_vector_style {
> > -    vect_partial_vectors_none,
> > -    vect_partial_vectors_while_ult,
> > -    vect_partial_vectors_avx512,
> > -    vect_partial_vectors_len
> > -};
> > -
> >  /* Key for map that records association between
> >     scalar conditions and corresponding loop mask, and
> >     is populated by vect_record_loop_mask.  */
> > @@ -2607,6 +2619,7 @@ extern tree vect_gen_perm_mask_checked (tree, const 
> > vec_perm_indices &);
> >  extern void optimize_mask_stores (class loop*);
> >  extern tree vect_gen_while (gimple_seq *, tree, tree, tree,
> >                             const char * = nullptr);
> > +extern void vect_gen_while_ssa_name (gimple_seq *, tree, tree, tree, tree);
> >  extern tree vect_gen_while_not (gimple_seq *, tree, tree, tree);
> >  extern opt_result vect_get_vector_types_for_stmt (vec_info *,
> >                                                   stmt_vec_info, tree *,
> > @@ -2788,7 +2801,14 @@ extern slp_tree vect_create_new_slp_node (unsigned, 
> > tree_code);
> >  extern void vect_free_slp_tree (slp_tree);
> >  extern bool compatible_calls_p (gcall *, gcall *, bool);
> >  extern int vect_slp_child_index_for_operand (const stmt_vec_info, int op);
> > -
> > +extern void vect_slp_record_bb_mask (slp_tree slp_node, unsigned int 
> > nvectors,
> > +                                    tree vectype, tree scalar_mask);
> > +extern tree vect_slp_get_bb_mask (slp_tree, gimple_stmt_iterator *,
> > +                                 unsigned int, tree, unsigned int);
> > +extern void vect_slp_record_bb_len (slp_tree slp_node, unsigned int 
> > nvectors,
> > +                                   tree vectype, unsigned int factor);
> > +extern tree vect_slp_get_bb_len (slp_tree, unsigned int, tree, unsigned 
> > int,
> > +                                unsigned int, bool);
> >  extern tree prepare_vec_mask (vec_info *, tree, tree, tree,
> >                               gimple_stmt_iterator *);
> >  extern tree vect_get_mask_load_else (int, tree);
> > @@ -2953,7 +2973,7 @@ vect_cannot_use_partial_vectors (vec_info *vinfo, 
> > slp_tree slp_node)
> >    if (loop_vinfo)
> >      LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false;
> >    else
> > -    (void) slp_node; /* FORNOW */
> > +    SLP_TREE_CAN_USE_PARTIAL_VECTORS_P (slp_node) = false;
> >  }
> >
> >  /* Return true if VINFO is vectorizer state for loop vectorization, we've
> > @@ -2967,10 +2987,8 @@ vect_fully_with_length_p (vec_info *vinfo, slp_tree 
> > slp_node)
> >    if (loop_vec_info loop_vinfo = dyn_cast<loop_vec_info> (vinfo))
> >      return LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo);
> >    else
> > -    {
> > -      (void) slp_node; /* FORNOW */
> > -      return false;
> > -    }
> > +    return SLP_TREE_PARTIAL_VECTORS_STYLE (slp_node)
> > +          == vect_partial_vectors_len;
> >  }
> >
> >  /* Return true if VINFO is vectorizer state for loop vectorization, we've
> > @@ -2984,10 +3002,8 @@ vect_fully_masked_p (vec_info *vinfo, slp_tree 
> > slp_node)
> >    if (loop_vec_info loop_vinfo = dyn_cast<loop_vec_info> (vinfo))
> >      return LOOP_VINFO_FULLY_MASKED_P (loop_vinfo);
> >    else
> > -    {
> > -      (void) slp_node; /* FORNOW */
> > -      return false;
> > -    }
> > +    return SLP_TREE_PARTIAL_VECTORS_STYLE (slp_node)
> > +          == vect_partial_vectors_while_ult;
> >  }
> >
> >  /* If STMT_INFO describes a reduction, return the vect_reduction_type
> > --
> > 2.43.0
> >

Reply via email to