On Wed, Jun 3, 2026 at 11:25 PM Christopher Bazley <[email protected]> wrote:
>
> Add two new fields to SLP tree nodes, which are accessed as
> SLP_TREE_CAN_USE_PARTIAL_VECTORS_P and SLP_TREE_PARTIAL_VECTORS_STYLE.
>
> SLP_TREE_CAN_USE_PARTIAL_VECTORS_P is analogous to the existing
> predicate LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P. It is initialized to
> true. This flag just records whether the target could vectorize a
> node using a partial vector; it does not say anything about
> whether the vector actually is partial, or how the target would support
> use of a partial vector. Some kinds of node require mask/length for
> partial vectors; others don't. In the latter case (e.g., for add
> operations), SLP_TREE_CAN_USE_PARTIAL_VECTORS_P will remain true.
>
> SLP_TREE_PARTIAL_VECTORS_STYLE is analogous to the existing field
> LOOP_VINFO_PARTIAL_VECTORS_STYLE. Both are initialized to 'none'.
> The vect_partial_vectors_avx512 enumerator is not used for BB SLP.
> Unlike loop vectorization, a different style of partial vectors can be
> chosen for each node during analysis of that node.
>
> Implement the recently-introduced wrapper functions,
> vect_record_(len|mask), for BB SLP by setting
> SLP_TREE_PARTIAL_VECTORS_STYLE to indicate that a mask or length should
> be used for a given SLP node. The passed-in vec_info is ignored.
>
> Implement the vect_fully_(masked|with_length)_p wrapper functions for
> BB SLP by checking the SLP_TREE_PARTIAL_VECTORS_STYLE. This should be
> sufficient because at most one of vect_record_(len|mask) and
> vect_cannot_use_partial_vectors are expected to be called for any
> given SLP node. SLP_TREE_CAN_USE_PARTIAL_VECTORS_P should be true if
> the style is not 'none', but its value isn't used beyond the analysis
> phase.
>
> The implementations of vect_get_mask and vect_get_len for BB SLP are
> non-trivial (albeit simpler than for loop vectorization), therefore they
> are delegated to SLP-specific functions defined in tree-vect-slp.cc.
>
> Implement the vect_cannot_use_partial_vectors wrapper function by
> setting the SLP_TREE_CAN_USE_PARTIAL_VECTORS_P flag to false.
> To prevent regressions, vect_can_use_partial_vectors_p still returns
> false for BB SLP regardless (for now). This prevents vect_record_mask
> or vect_record_len from being called.
>
> gcc/ChangeLog:
>
>         * tree-vect-slp.cc (_slp_tree::_slp_tree): initialize new
>         partial_vector_style, can_use_partial_vectors and
>         num_partial_vectors members.
>         (vect_slp_analyze_node_operations): Account for worst-case
>         prologue costs of per-node partial-vector mask or length
>         materialisation.
>         (vect_slp_record_bb_style): Set the partial vector style of an
>         SLP node, checking that the style does not flip-flop between mask
>         and length.
>         (vect_slp_record_bb_mask): Use vect_slp_record_bb_style to set
>         the partial vector style of the SLP tree node to
>         vect_partial_vectors_while_ult.
>         (vect_slp_get_bb_mask): New function to materialize a mask for
>         basic block SLP vectorization.
>         (vect_slp_record_bb_len): Use vect_slp_record_bb_style to set
>         the partial vector style of the SLP tree node to
>         vect_partial_vectors_len.
>         (vect_slp_get_bb_len): New function to materialize a length for
>         basic block SLP vectorization.
>         * tree-vect-stmts.cc (vectorizable_internal_function):
>         (vect_record_mask): Handle the basic block SLP use case by
>         delegating to vect_slp_record_bb_mask.
>         (vect_get_mask): Handle the basic block SLP use case by
>         delegating to vect_slp_get_bb_mask.
>         (vect_record_len): Handle the basic block SLP use case by
>         delegating to vect_slp_record_bb_len.
>         (vect_get_len): Handle the basic block SLP use case by
>         delegating to vect_slp_get_bb_len.
>         (vect_gen_while_ssa_name): New function containing code
>         refactored out of vect_gen_while for reuse by
>         vect_slp_get_bb_mask.
>         (vect_gen_while): Use vect_gen_while_ssa_name instead of custom
>         code for some of the implementation.
>         * tree-vectorizer.h (enum vect_partial_vector_style): Move this
>         definition earlier to allow reuse by struct _slp_tree.
>         (struct _slp_tree): Add a partial_vector_style member to record
>         whether to use a length or mask for the SLP tree node, if
>         partial vectors are required and supported.
>         Add a can_use_partial_vectors member to record whether partial
>         vectors are supported for the SLP tree node.
>         Add a num_partial_vectors member for costing.
>         (SLP_TREE_PARTIAL_VECTORS_STYLE): New member accessor macro.
>         (SLP_TREE_CAN_USE_PARTIAL_VECTORS_P): New member accessor macro.
>         (SLP_TREE_NUM_PARTIAL_VECTORS): New member accessor macro.
>         (vect_gen_while_ssa_name): Declaration of a new function.
>         (vect_slp_get_bb_mask): As above.
>         (vect_slp_get_bb_len): As above.
>         (vect_cannot_use_partial_vectors): Handle the basic block SLP
>         use-case by setting SLP_TREE_CAN_USE_PARTIAL_VECTORS_P to
>         false.
>         (vect_fully_with_length_p): Handle the basic block SLP use
>         case by checking whether the SLP_TREE_PARTIAL_VECTORS_STYLE is
>         vect_partial_vectors_len.
>         (vect_fully_masked_p): Handle the basic block SLP use case by
>         checking whether the SLP_TREE_PARTIAL_VECTORS_STYLE is
>         vect_partial_vectors_while_ult.
> ---
>  gcc/tree-vect-slp.cc   | 182 +++++++++++++++++++++++++++++++++++++++++
>  gcc/tree-vect-stmts.cc |  52 +++++++-----
>  gcc/tree-vectorizer.h  |  52 ++++++++----
>  3 files changed, 247 insertions(+), 39 deletions(-)
>
> diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> index 075e93f04a9..4dd7e6e1e21 100644
> --- a/gcc/tree-vect-slp.cc
> +++ b/gcc/tree-vect-slp.cc
> @@ -125,6 +125,9 @@ _slp_tree::_slp_tree ()
>    SLP_TREE_GS_BASE (this) = NULL_TREE;
>    this->ldst_lanes = false;
>    this->avoid_stlf_fail = false;
> +  SLP_TREE_PARTIAL_VECTORS_STYLE (this) = vect_partial_vectors_none;
> +  SLP_TREE_CAN_USE_PARTIAL_VECTORS_P (this) = true;
> +  SLP_TREE_NUM_PARTIAL_VECTORS (this) = 0;
>    SLP_TREE_VECTYPE (this) = NULL_TREE;
>    SLP_TREE_REPRESENTATIVE (this) = NULL;
>    this->cycle_info.id = -1;
> @@ -8958,6 +8961,40 @@ vect_slp_analyze_node_operations (vec_info *vinfo, 
> slp_tree node,
>           vect_prologue_cost_for_slp (vinfo, child, cost_vec);
>         }
>
> +  if (res)
> +    {
> +      /* Take care of special costs for partial vectors.
> +        Costing each partial vector is excessive for many SLP instances,
> +        because it is common to materialise identical masks/lengths for 
> related
> +        operations (e.g., for vector loads and stores of the same length).
> +        Masks/lengths can also be shared between SLP subgraphs or eliminated 
> by
> +        pattern-based lowering during instruction selection.  However, it's
> +        simpler and safer to use the worst-case cost; if this ends up being 
> the
> +        tie-breaker between vectorizing or not, then it's probably better not
> +        to vectorize.  */
> +      const int num_partial_vectors = SLP_TREE_NUM_PARTIAL_VECTORS (node);
> +
> +      if (SLP_TREE_PARTIAL_VECTORS_STYLE (node)
> +         == vect_partial_vectors_while_ult)
> +       {
> +         gcc_assert (num_partial_vectors > 0);
> +         record_stmt_cost (cost_vec, num_partial_vectors, vector_stmt, NULL,
> +                           NULL, NULL_TREE, 0, vect_prologue);
> +       }
> +      else if (SLP_TREE_PARTIAL_VECTORS_STYLE (node)
> +              == vect_partial_vectors_len)
> +       {
> +         /* Need to set up a length in the prologue.  */
> +         gcc_assert (num_partial_vectors > 0);
> +         record_stmt_cost (cost_vec, num_partial_vectors, scalar_stmt, NULL,
> +                           NULL, NULL_TREE, 0, vect_prologue);
> +       }
> +      else
> +       {
> +         gcc_assert (num_partial_vectors == 0);
> +       }
> +    }
> +
>    /* If this node or any of its children can't be vectorized, try pruning
>       the tree here rather than felling the whole thing.  */
>    if (!res && vect_slp_convert_to_external (vinfo, node, node_instance))
> @@ -12441,3 +12478,148 @@ vect_schedule_slp (vec_info *vinfo, const 
> vec<slp_instance> &slp_instances)
>          }
>      }
>  }
> +
> +/* Record that a specific partial vector style could be used to vectorize
> +   SLP_NODE if required.  */
> +
> +static void
> +vect_slp_record_bb_style (slp_tree slp_node, vect_partial_vector_style style)
> +{
> +  gcc_assert (style != vect_partial_vectors_none);
> +  gcc_assert (style != vect_partial_vectors_avx512);
> +
> +  if (SLP_TREE_PARTIAL_VECTORS_STYLE (slp_node) == vect_partial_vectors_none)
> +    SLP_TREE_PARTIAL_VECTORS_STYLE (slp_node) = style;
> +  else
> +    gcc_assert (SLP_TREE_PARTIAL_VECTORS_STYLE (slp_node) == style);
> +}
> +
> +/* Record that a complete set of masks associated with SLP_NODE would need to
> +   contain a sequence of NVECTORS masks that each control a vector of type
> +   VECTYPE.  If SCALAR_MASK is nonnull, the fully-masked loop would AND
> +   these vector masks with the vector version of SCALAR_MASK.  */
> +void
> +vect_slp_record_bb_mask (slp_tree slp_node, unsigned int /* nvectors */,
> +                        tree /* vectype */, tree /* scalar_mask */)
> +{
> +  vect_slp_record_bb_style (slp_node, vect_partial_vectors_while_ult);
> +
> +  /* FORNOW: this often overestimates the number of masks for costing 
> purposes
> +     because, after lowering, masks have often been eliminated, shared 
> between
> +     SLP nodes, or even shared between SLP subgraphs.  */
> +  SLP_TREE_NUM_PARTIAL_VECTORS(slp_node) ++;
> +}
> +
> +/* Materialize mask number INDEX for a group of scalar stmts in SLP_NODE that
> +   operate on NVECTORS vectors of type VECTYPE, where 0 <= INDEX < NVECTORS.
> +   Insert any set-up statements before GSI.  */
> +
> +tree
> +vect_slp_get_bb_mask (slp_tree slp_node, gimple_stmt_iterator *gsi,
> +                     unsigned int nvectors, tree vectype, unsigned int index)
> +{
> +  gcc_assert (SLP_TREE_PARTIAL_VECTORS_STYLE (slp_node)
> +             == vect_partial_vectors_while_ult);
> +  gcc_assert (nvectors >= 1);
> +  gcc_assert (index < nvectors);
> +
> +  const poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype);
> +  const unsigned int group_size = SLP_TREE_LANES (slp_node);
> +  unsigned int mask_size = group_size;
> +  const tree masktype = truth_type_for (vectype);
> +
> +  if (nunits.is_constant ())
> +    {
> +      /* Only the last vector can be a partial vector.  */
> +      if (index + 1 < nvectors)
> +       return build_minus_one_cst (masktype);
> +
> +      /* Return a mask for a possibly-partial tail vector. */
> +      const unsigned int const_nunits = nunits.to_constant ();
> +      const unsigned int head_size = (nvectors - 1) * const_nunits;
> +      gcc_assert (head_size <= group_size);
> +      mask_size = group_size - head_size;
> +
> +      if (mask_size == const_nunits)
> +       return build_minus_one_cst (masktype);
> +    }
> +  else
> +    {
> +      /* Return a mask for a single variable-length vector. */
> +      gcc_assert (nvectors == 1);
> +      gcc_assert (known_le (mask_size, nunits));
> +    }
> +
> +  /* FORNOW: don't bother maintaining a set of mask constants to allow
> +     sharing between nodes belonging to the same instance of bb_vec_info
> +     or even within the same SLP subgraph.  */
> +  gimple_seq stmts = NULL;
> +  const tree cmp_type = size_type_node;
> +  const tree start_index = build_zero_cst (cmp_type);
> +  const tree end_index = build_int_cst (cmp_type, mask_size);
> +  const tree mask = make_temp_ssa_name (masktype, NULL, "slp_mask");
> +  vect_gen_while_ssa_name (&stmts, masktype, start_index, end_index, mask);

Not a review, I've encountered an ICE when trying to compile with x86 avx512

./gcc/xgcc -B ./gcc -O3 -march=sapphirerapids slp_pred_1.c -S

during GIMPLE pass: slp
slp_pred_1.c: In function ‘f’:
slp_pred_1.c:11:1: internal compiler error: in
vect_gen_while_ssa_name, at tree-vect-stmts.cc:14883
   11 | f (uint8_t *x)
      | ^
0x26038eb internal_error(char const*, ...)
        ../../slp_pred_tail/gcc/diagnostic-global-context.cc:787
0x9e8768 fancy_abort(char const*, int, char const*)
        ../../slp_pred_tail/gcc/diagnostics/context.cc:1813
0x8dca22 vect_gen_while_ssa_name(gimple**, tree_node*, tree_node*,
tree_node*, tree_node*)
        ../../slp_pred_tail/gcc/tree-vect-stmts.cc:14883
0x14f182a vect_slp_get_bb_mask(_slp_tree*, gimple_stmt_iterator*,
unsigned int, tree_node*, unsigned int)
        ../../slp_pred_tail/gcc/tree-vect-slp.cc:12688
0x149cab7 vectorizable_load
        ../../slp_pred_tail/gcc/tree-vect-stmts.cc:11522
0x14ad760 vect_transform_stmt(vec_info*, _stmt_vec_info*,
gimple_stmt_iterator*, _slp_tree*, _slp_instance*)
        ../../slp_pred_tail/gcc/tree-vect-stmts.cc:13581
0x14eee89 vect_schedule_slp_node
        ../../slp_pred_tail/gcc/tree-vect-slp.cc:12171
0x15123d1 vect_schedule_slp_node
        ../../slp_pred_tail/gcc/tree-vect-slp.cc:11940
0x15123d1 vect_schedule_scc
        ../../slp_pred_tail/gcc/tree-vect-slp.cc:12418
0x151236a vect_schedule_scc
        ../../slp_pred_tail/gcc/tree-vect-slp.cc:12399
0x151236a vect_schedule_scc
        ../../slp_pred_tail/gcc/tree-vect-slp.cc:12399
0x1512a49 vect_schedule_slp(vec_info*, vec<_slp_instance*, va_heap,
vl_ptr> const&)
        ../../slp_pred_tail/gcc/tree-vect-slp.cc:12563
0x15145af vect_slp_region
        ../../slp_pred_tail/gcc/tree-vect-slp.cc:10445
0x151640b vect_slp_bbs
        ../../slp_pred_tail/gcc/tree-vect-slp.cc:10557
0x15169b4 vect_slp_function(function*)
        ../../slp_pred_tail/gcc/tree-vect-slp.cc:10679
0x1521ad2 execute
        ../../slp_pred_tail/gcc/tree-vectorizer.cc:1570

It materializes BB-SLP tail masks with WHILE_ULT, which x86 doesn’t support.


After manually using a constant mask for avx512, I encountered another
performance issue.
if I change slp_pred_1.c to
void
f (uint8_t *x)
{
  x[0] += 1;
  x[1] += 2;
  x[2] += 1;
  x[3] += 2;
  x[4] += 1;
  x[5] += 2;
  x[6] += 1;
  x[7] += 2;
  x[8] += 1;
  x[9] += 2;
  x[10] += 1;
  x[11] += 2;
  x[12] += 1;
  x[13] += 2;
  x[14] += 1;
  x[15] += 4;
}
with -march=sapphirerapids -O3, it generates

 <bb 2> [local count: 1073741824]:
 vectp.4_51 = x_34(D);
 vect__1.5_52 = .MASK_LOAD (vectp.4_51, 8B, { -1, -1, -1, -1, -1, -1,
-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 0, 0, 0, 0, 0,
, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }, { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0 });
 vect__2.6_53 = vect__1.5_52 + { 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2,
1, 2, 1, 4, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1,
4 };
 _1 = *x_34(D);

But a 128-bit vector w/o mask should be used here instead of using
256-bit vector + mask off upper 128-bit.

 <bb 2> [local count: 1073741824]:
 vectp.4_51 = x_34(D);
 vect__1.5_52 = MEM <vector(16) unsigned char> [(uint8_t *)vectp.4_51];
 vect__2.6_53 = vect__1.5_52 + { 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2,
1, 2, 1, 4 };
 _1 = *x_34(D);
 _2 = _1 + 1;
 _3 = MEM[(uint8_t *)x_34(D) + 1B];

Similarly, for original slp-pred-1.c, a 128-bit vector should be used
with a mask instead of 256-bit vector.

> +  gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);
> +  return mask;
> +}
> +
> +/* Record that a complete set of lengths associated with SLP_NODE would need 
> to
> +   contain a sequence of NVECTORS lengths for controlling an operation on
> +   VECTYPE.  The operation splits each element of VECTYPE into FACTOR 
> separate
> +   subelements, measuring the length as a number of these subelements.  */
> +
> +void
> +vect_slp_record_bb_len (slp_tree slp_node, unsigned int /* nvectors */,
> +                       tree /* vectype */, unsigned int /* factor */)
> +{
> +  vect_slp_record_bb_style (slp_node, vect_partial_vectors_len);
> +
> +  /* FORNOW: this probably overestimates the number of lengths for costing
> +     purposes because, after lowering, lengths might have been eliminated,
> +     shared between SLP nodes, or even shared between SLP subgraphs.  */
> +  SLP_TREE_NUM_PARTIAL_VECTORS (slp_node)++;
> +}
> +
> +/* Materialize length number INDEX for a group of scalar stmts in SLP_NODE 
> that
> +   operate on NVECTORS vectors of type VECTYPE, where 0 <= INDEX < NVECTORS.
> +   Return a value that contains FACTOR multiplied by the number of elements 
> that
> +   should be processed.  */
> +
> +tree
> +vect_slp_get_bb_len (slp_tree slp_node, unsigned int nvectors, tree vectype,
> +                    unsigned int index, unsigned int factor, bool adjusted)
> +{
> +  gcc_checking_assert (SLP_TREE_PARTIAL_VECTORS_STYLE (slp_node)
> +                      == vect_partial_vectors_len);
> +  gcc_assert (nvectors >= 1);
> +  gcc_assert (index < nvectors);
> +  (void) adjusted;
> +
> +  const poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype);
> +  const unsigned int group_size = SLP_TREE_LANES (slp_node);
> +  unsigned int len = group_size;
> +
> +  if (nunits.is_constant ())
> +    {
> +      const unsigned int const_nunits = nunits.to_constant ();
> +
> +      /* Only the last vector can be a partial vector.  */
> +      if (index + 1 < nvectors)
> +       len = const_nunits;
> +      else
> +       {
> +         /* Return a length for a possibly-partial tail vector. */
> +         const unsigned int head_size = (nvectors - 1) * const_nunits;
> +         gcc_assert (head_size <= group_size);
> +         len = group_size - head_size;
> +       }
> +    }
> +  else
> +    {
> +      /* Return a length for a single variable-length vector. */
> +      gcc_assert (nvectors == 1);
> +      gcc_assert (known_le (len, nunits));
> +    }
> +
> +  return size_int (len * factor);
> +}
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index 15fca17a407..ecad74e7cbf 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -1385,7 +1385,9 @@ vectorizable_internal_function (combined_fn cfn, tree 
> fndecl,
>  /* Record that a complete set of masks associated with VINFO would need to
>     contain a sequence of NVECTORS masks that each control a vector of type
>     VECTYPE.  If SCALAR_MASK is nonnull, the fully-masked loop would AND
> -   these vector masks with the vector version of SCALAR_MASK.  */
> +   these vector masks with the vector version of SCALAR_MASK.  Alternatively,
> +   if doing basic block vectorization, record that a mask could be used to
> +   vectorize SLP_NODE if required.  */
>  static void
>  vect_record_mask (vec_info *vinfo, slp_tree slp_node, unsigned int nvectors,
>                   tree vectype, tree scalar_mask)
> @@ -1395,7 +1397,7 @@ vect_record_mask (vec_info *vinfo, slp_tree slp_node, 
> unsigned int nvectors,
>      vect_record_loop_mask (loop_vinfo, &LOOP_VINFO_MASKS (loop_vinfo), 
> nvectors,
>                            vectype, scalar_mask);
>    else
> -    (void) slp_node; /* FORNOW */
> +    vect_slp_record_bb_mask (slp_node, nvectors, vectype, scalar_mask);
>  }
>
>  /* Given a complete set of masks associated with VINFO, extract mask number
> @@ -1413,16 +1415,15 @@ vect_get_mask (vec_info *vinfo, slp_tree slp_node, 
> gimple_stmt_iterator *gsi,
>      return vect_get_loop_mask (loop_vinfo, gsi, &LOOP_VINFO_MASKS 
> (loop_vinfo),
>                                nvectors, vectype, index);
>    else
> -    {
> -      (void) slp_node; /* FORNOW */
> -      return NULL_TREE;
> -    }
> +    return vect_slp_get_bb_mask (slp_node, gsi, nvectors, vectype, index);
>  }
>
>  /* Record that a complete set of lengths associated with VINFO would need to
>     contain a sequence of NVECTORS lengths for controlling an operation on
>     VECTYPE.  The operation splits each element of VECTYPE into FACTOR 
> separate
> -   subelements, measuring the length as a number of these subelements.  */
> +   subelements, measuring the length as a number of these subelements.
> +   Alternatively, if doing basic block vectorization, record that a length 
> limit
> +   could be used to vectorize SLP_NODE if required.  */
>  static void
>  vect_record_len (vec_info *vinfo, slp_tree slp_node, unsigned int nvectors,
>                  tree vectype, unsigned int factor)
> @@ -1432,7 +1433,7 @@ vect_record_len (vec_info *vinfo, slp_tree slp_node, 
> unsigned int nvectors,
>      vect_record_loop_len (loop_vinfo, &LOOP_VINFO_LENS (loop_vinfo), 
> nvectors,
>                           vectype, factor);
>    else
> -    (void) slp_node; /* FORNOW */
> +    vect_slp_record_bb_len (slp_node, nvectors, vectype, factor);
>  }
>
>  /* Given a complete set of lengths associated with VINFO, extract length 
> number
> @@ -1453,10 +1454,8 @@ vect_get_len (vec_info *vinfo, slp_tree slp_node, 
> gimple_stmt_iterator *gsi,
>      return vect_get_loop_len (loop_vinfo, gsi, &LOOP_VINFO_LENS (loop_vinfo),
>                               nvectors, vectype, index, factor, adjusted);
>    else
> -    {
> -      (void) slp_node; /* FORNOW */
> -      return NULL_TREE;
> -    }
> +    return vect_slp_get_bb_len (slp_node, nvectors, vectype, index, factor,
> +                               adjusted);
>  }
>
>  static tree permute_vec_elements (vec_info *, tree, tree, tree, 
> stmt_vec_info,
> @@ -14710,24 +14709,35 @@ supportable_indirect_convert_operation (code_helper 
> code,
>     mask[I] is true iff J + START_INDEX < END_INDEX for all J <= I.
>     Add the statements to SEQ.  */
>
> +void
> +vect_gen_while_ssa_name (gimple_seq *seq, tree mask_type, tree start_index,
> +                        tree end_index, tree ssa_name)
> +{
> +  tree cmp_type = TREE_TYPE (start_index);
> +  gcc_checking_assert (direct_internal_fn_supported_p (IFN_WHILE_ULT, 
> cmp_type,
> +                                                      mask_type,
> +                                                      OPTIMIZE_FOR_SPEED));
> +  gcall *call
> +    = gimple_build_call_internal (IFN_WHILE_ULT, 3, start_index, end_index,
> +                                 build_zero_cst (mask_type));
> +  gimple_call_set_lhs (call, ssa_name);
> +  gimple_seq_add_stmt (seq, call);
> +}
> +
> +/*  Like vect_gen_while_ssa_name except that it creates a new SSA_NAME node
> +    for type MASK_TYPE defined in the created GIMPLE_CALL statement.  If NAME
> +    is not a null pointer then it is used for the SSA_NAME in dumps.  */
> +
>  tree
>  vect_gen_while (gimple_seq *seq, tree mask_type, tree start_index,
>                 tree end_index, const char *name)
>  {
> -  tree cmp_type = TREE_TYPE (start_index);
> -  gcc_checking_assert (direct_internal_fn_supported_p (IFN_WHILE_ULT,
> -                                                      cmp_type, mask_type,
> -                                                      OPTIMIZE_FOR_SPEED));
> -  gcall *call = gimple_build_call_internal (IFN_WHILE_ULT, 3,
> -                                           start_index, end_index,
> -                                           build_zero_cst (mask_type));
>    tree tmp;
>    if (name)
>      tmp = make_temp_ssa_name (mask_type, NULL, name);
>    else
>      tmp = make_ssa_name (mask_type);
> -  gimple_call_set_lhs (call, tmp);
> -  gimple_seq_add_stmt (seq, call);
> +  vect_gen_while_ssa_name (seq, mask_type, start_index, end_index, tmp);
>    return tmp;
>  }
>
> diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
> index a3855568b09..f79f04ff8ac 100644
> --- a/gcc/tree-vectorizer.h
> +++ b/gcc/tree-vectorizer.h
> @@ -312,6 +312,13 @@ struct vect_load_store_data : vect_data {
>    bool subchain_p; // VMAT_STRIDED_SLP and VMAT_GATHER_SCATTER
>  };
>
> +enum vect_partial_vector_style {
> +  vect_partial_vectors_none,
> +  vect_partial_vectors_while_ult,
> +  vect_partial_vectors_avx512,
> +  vect_partial_vectors_len
> +};
> +
>  /* A computation tree of an SLP instance.  Each node corresponds to a group 
> of
>     stmts to be packed in a SIMD stmt.  */
>  struct _slp_tree {
> @@ -377,7 +384,16 @@ struct _slp_tree {
>    /* For BB vect, flag to indicate this load node should be vectorized
>       as to avoid STLF fails because of related stores.  */
>    bool avoid_stlf_fail;
> -
> +  /* The style used for implementing partial vectors if LANES is less than
> +     the minimum number of lanes implied by the VECTYPE.  */
> +  vect_partial_vector_style partial_vector_style;
> +  /* Flag to indicate whether we still have the option of vectorizing this 
> node
> +     using partial vectors (i.e.  using lengths or masks to prevent use of
> +     inactive scalar lanes).  */
> +  bool can_use_partial_vectors;
> +  /* Number of partial vectors, for costing purposes. Should be 0 unless a
> +     partial vector style has been set.  */
> +  int num_partial_vectors;
>    int vertex;
>
>    /* The kind of operation as determined by analysis and optional
> @@ -476,6 +492,9 @@ public:
>  #define SLP_TREE_GS_BASE(S)                     (S)->gs_base
>  #define SLP_TREE_REDUC_IDX(S)                   (S)->cycle_info.reduc_idx
>  #define SLP_TREE_PERMUTE_P(S)                   ((S)->code == VEC_PERM_EXPR)
> +#define SLP_TREE_PARTIAL_VECTORS_STYLE(S)       (S)->partial_vector_style
> +#define SLP_TREE_CAN_USE_PARTIAL_VECTORS_P(S)   (S)->can_use_partial_vectors
> +#define SLP_TREE_NUM_PARTIAL_VECTORS(S)                 
> (S)->num_partial_vectors
>
>  inline vect_memory_access_type
>  SLP_TREE_MEMORY_ACCESS_TYPE (slp_tree node)
> @@ -486,13 +505,6 @@ SLP_TREE_MEMORY_ACCESS_TYPE (slp_tree node)
>    return VMAT_UNINITIALIZED;
>  }
>
> -enum vect_partial_vector_style {
> -    vect_partial_vectors_none,
> -    vect_partial_vectors_while_ult,
> -    vect_partial_vectors_avx512,
> -    vect_partial_vectors_len
> -};
> -
>  /* Key for map that records association between
>     scalar conditions and corresponding loop mask, and
>     is populated by vect_record_loop_mask.  */
> @@ -2607,6 +2619,7 @@ extern tree vect_gen_perm_mask_checked (tree, const 
> vec_perm_indices &);
>  extern void optimize_mask_stores (class loop*);
>  extern tree vect_gen_while (gimple_seq *, tree, tree, tree,
>                             const char * = nullptr);
> +extern void vect_gen_while_ssa_name (gimple_seq *, tree, tree, tree, tree);
>  extern tree vect_gen_while_not (gimple_seq *, tree, tree, tree);
>  extern opt_result vect_get_vector_types_for_stmt (vec_info *,
>                                                   stmt_vec_info, tree *,
> @@ -2788,7 +2801,14 @@ extern slp_tree vect_create_new_slp_node (unsigned, 
> tree_code);
>  extern void vect_free_slp_tree (slp_tree);
>  extern bool compatible_calls_p (gcall *, gcall *, bool);
>  extern int vect_slp_child_index_for_operand (const stmt_vec_info, int op);
> -
> +extern void vect_slp_record_bb_mask (slp_tree slp_node, unsigned int 
> nvectors,
> +                                    tree vectype, tree scalar_mask);
> +extern tree vect_slp_get_bb_mask (slp_tree, gimple_stmt_iterator *,
> +                                 unsigned int, tree, unsigned int);
> +extern void vect_slp_record_bb_len (slp_tree slp_node, unsigned int nvectors,
> +                                   tree vectype, unsigned int factor);
> +extern tree vect_slp_get_bb_len (slp_tree, unsigned int, tree, unsigned int,
> +                                unsigned int, bool);
>  extern tree prepare_vec_mask (vec_info *, tree, tree, tree,
>                               gimple_stmt_iterator *);
>  extern tree vect_get_mask_load_else (int, tree);
> @@ -2953,7 +2973,7 @@ vect_cannot_use_partial_vectors (vec_info *vinfo, 
> slp_tree slp_node)
>    if (loop_vinfo)
>      LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false;
>    else
> -    (void) slp_node; /* FORNOW */
> +    SLP_TREE_CAN_USE_PARTIAL_VECTORS_P (slp_node) = false;
>  }
>
>  /* Return true if VINFO is vectorizer state for loop vectorization, we've
> @@ -2967,10 +2987,8 @@ vect_fully_with_length_p (vec_info *vinfo, slp_tree 
> slp_node)
>    if (loop_vec_info loop_vinfo = dyn_cast<loop_vec_info> (vinfo))
>      return LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo);
>    else
> -    {
> -      (void) slp_node; /* FORNOW */
> -      return false;
> -    }
> +    return SLP_TREE_PARTIAL_VECTORS_STYLE (slp_node)
> +          == vect_partial_vectors_len;
>  }
>
>  /* Return true if VINFO is vectorizer state for loop vectorization, we've
> @@ -2984,10 +3002,8 @@ vect_fully_masked_p (vec_info *vinfo, slp_tree 
> slp_node)
>    if (loop_vec_info loop_vinfo = dyn_cast<loop_vec_info> (vinfo))
>      return LOOP_VINFO_FULLY_MASKED_P (loop_vinfo);
>    else
> -    {
> -      (void) slp_node; /* FORNOW */
> -      return false;
> -    }
> +    return SLP_TREE_PARTIAL_VECTORS_STYLE (slp_node)
> +          == vect_partial_vectors_while_ult;
>  }
>
>  /* If STMT_INFO describes a reduction, return the vect_reduction_type
> --
> 2.43.0
>


-- 
BR,
Hongtao

Reply via email to