On Wed, Jun 3, 2026 at 11:25 PM Christopher Bazley <[email protected]> wrote:
>
> Add two new fields to SLP tree nodes, which are accessed as
> SLP_TREE_CAN_USE_PARTIAL_VECTORS_P and SLP_TREE_PARTIAL_VECTORS_STYLE.
>
> SLP_TREE_CAN_USE_PARTIAL_VECTORS_P is analogous to the existing
> predicate LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P. It is initialized to
> true. This flag just records whether the target could vectorize a
> node using a partial vector; it does not say anything about
> whether the vector actually is partial, or how the target would support
> use of a partial vector. Some kinds of node require mask/length for
> partial vectors; others don't. In the latter case (e.g., for add
> operations), SLP_TREE_CAN_USE_PARTIAL_VECTORS_P will remain true.
>
> SLP_TREE_PARTIAL_VECTORS_STYLE is analogous to the existing field
> LOOP_VINFO_PARTIAL_VECTORS_STYLE. Both are initialized to 'none'.
> The vect_partial_vectors_avx512 enumerator is not used for BB SLP.
> Unlike loop vectorization, a different style of partial vectors can be
> chosen for each node during analysis of that node.
>
> Implement the recently-introduced wrapper functions,
> vect_record_(len|mask), for BB SLP by setting
> SLP_TREE_PARTIAL_VECTORS_STYLE to indicate that a mask or length should
> be used for a given SLP node. The passed-in vec_info is ignored.
>
> Implement the vect_fully_(masked|with_length)_p wrapper functions for
> BB SLP by checking the SLP_TREE_PARTIAL_VECTORS_STYLE. This should be
> sufficient because at most one of vect_record_(len|mask) and
> vect_cannot_use_partial_vectors are expected to be called for any
> given SLP node. SLP_TREE_CAN_USE_PARTIAL_VECTORS_P should be true if
> the style is not 'none', but its value isn't used beyond the analysis
> phase.
>
> The implementations of vect_get_mask and vect_get_len for BB SLP are
> non-trivial (albeit simpler than for loop vectorization), therefore they
> are delegated to SLP-specific functions defined in tree-vect-slp.cc.
>
> Implement the vect_cannot_use_partial_vectors wrapper function by
> setting the SLP_TREE_CAN_USE_PARTIAL_VECTORS_P flag to false.
> To prevent regressions, vect_can_use_partial_vectors_p still returns
> false for BB SLP regardless (for now). This prevents vect_record_mask
> or vect_record_len from being called.
>
> gcc/ChangeLog:
>
> * tree-vect-slp.cc (_slp_tree::_slp_tree): initialize new
> partial_vector_style, can_use_partial_vectors and
> num_partial_vectors members.
> (vect_slp_analyze_node_operations): Account for worst-case
> prologue costs of per-node partial-vector mask or length
> materialisation.
> (vect_slp_record_bb_style): Set the partial vector style of an
> SLP node, checking that the style does not flip-flop between mask
> and length.
> (vect_slp_record_bb_mask): Use vect_slp_record_bb_style to set
> the partial vector style of the SLP tree node to
> vect_partial_vectors_while_ult.
> (vect_slp_get_bb_mask): New function to materialize a mask for
> basic block SLP vectorization.
> (vect_slp_record_bb_len): Use vect_slp_record_bb_style to set
> the partial vector style of the SLP tree node to
> vect_partial_vectors_len.
> (vect_slp_get_bb_len): New function to materialize a length for
> basic block SLP vectorization.
> * tree-vect-stmts.cc (vectorizable_internal_function):
> (vect_record_mask): Handle the basic block SLP use case by
> delegating to vect_slp_record_bb_mask.
> (vect_get_mask): Handle the basic block SLP use case by
> delegating to vect_slp_get_bb_mask.
> (vect_record_len): Handle the basic block SLP use case by
> delegating to vect_slp_record_bb_len.
> (vect_get_len): Handle the basic block SLP use case by
> delegating to vect_slp_get_bb_len.
> (vect_gen_while_ssa_name): New function containing code
> refactored out of vect_gen_while for reuse by
> vect_slp_get_bb_mask.
> (vect_gen_while): Use vect_gen_while_ssa_name instead of custom
> code for some of the implementation.
> * tree-vectorizer.h (enum vect_partial_vector_style): Move this
> definition earlier to allow reuse by struct _slp_tree.
> (struct _slp_tree): Add a partial_vector_style member to record
> whether to use a length or mask for the SLP tree node, if
> partial vectors are required and supported.
> Add a can_use_partial_vectors member to record whether partial
> vectors are supported for the SLP tree node.
> Add a num_partial_vectors member for costing.
> (SLP_TREE_PARTIAL_VECTORS_STYLE): New member accessor macro.
> (SLP_TREE_CAN_USE_PARTIAL_VECTORS_P): New member accessor macro.
> (SLP_TREE_NUM_PARTIAL_VECTORS): New member accessor macro.
> (vect_gen_while_ssa_name): Declaration of a new function.
> (vect_slp_get_bb_mask): As above.
> (vect_slp_get_bb_len): As above.
> (vect_cannot_use_partial_vectors): Handle the basic block SLP
> use-case by setting SLP_TREE_CAN_USE_PARTIAL_VECTORS_P to
> false.
> (vect_fully_with_length_p): Handle the basic block SLP use
> case by checking whether the SLP_TREE_PARTIAL_VECTORS_STYLE is
> vect_partial_vectors_len.
> (vect_fully_masked_p): Handle the basic block SLP use case by
> checking whether the SLP_TREE_PARTIAL_VECTORS_STYLE is
> vect_partial_vectors_while_ult.
> ---
> gcc/tree-vect-slp.cc | 182 +++++++++++++++++++++++++++++++++++++++++
> gcc/tree-vect-stmts.cc | 52 +++++++-----
> gcc/tree-vectorizer.h | 52 ++++++++----
> 3 files changed, 247 insertions(+), 39 deletions(-)
>
> diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> index 075e93f04a9..4dd7e6e1e21 100644
> --- a/gcc/tree-vect-slp.cc
> +++ b/gcc/tree-vect-slp.cc
> @@ -125,6 +125,9 @@ _slp_tree::_slp_tree ()
> SLP_TREE_GS_BASE (this) = NULL_TREE;
> this->ldst_lanes = false;
> this->avoid_stlf_fail = false;
> + SLP_TREE_PARTIAL_VECTORS_STYLE (this) = vect_partial_vectors_none;
> + SLP_TREE_CAN_USE_PARTIAL_VECTORS_P (this) = true;
> + SLP_TREE_NUM_PARTIAL_VECTORS (this) = 0;
> SLP_TREE_VECTYPE (this) = NULL_TREE;
> SLP_TREE_REPRESENTATIVE (this) = NULL;
> this->cycle_info.id = -1;
> @@ -8958,6 +8961,40 @@ vect_slp_analyze_node_operations (vec_info *vinfo,
> slp_tree node,
> vect_prologue_cost_for_slp (vinfo, child, cost_vec);
> }
>
> + if (res)
> + {
> + /* Take care of special costs for partial vectors.
> + Costing each partial vector is excessive for many SLP instances,
> + because it is common to materialise identical masks/lengths for
> related
> + operations (e.g., for vector loads and stores of the same length).
> + Masks/lengths can also be shared between SLP subgraphs or eliminated
> by
> + pattern-based lowering during instruction selection. However, it's
> + simpler and safer to use the worst-case cost; if this ends up being
> the
> + tie-breaker between vectorizing or not, then it's probably better not
> + to vectorize. */
> + const int num_partial_vectors = SLP_TREE_NUM_PARTIAL_VECTORS (node);
> +
> + if (SLP_TREE_PARTIAL_VECTORS_STYLE (node)
> + == vect_partial_vectors_while_ult)
> + {
> + gcc_assert (num_partial_vectors > 0);
> + record_stmt_cost (cost_vec, num_partial_vectors, vector_stmt, NULL,
> + NULL, NULL_TREE, 0, vect_prologue);
> + }
> + else if (SLP_TREE_PARTIAL_VECTORS_STYLE (node)
> + == vect_partial_vectors_len)
> + {
> + /* Need to set up a length in the prologue. */
> + gcc_assert (num_partial_vectors > 0);
> + record_stmt_cost (cost_vec, num_partial_vectors, scalar_stmt, NULL,
> + NULL, NULL_TREE, 0, vect_prologue);
> + }
> + else
> + {
> + gcc_assert (num_partial_vectors == 0);
> + }
> + }
> +
> /* If this node or any of its children can't be vectorized, try pruning
> the tree here rather than felling the whole thing. */
> if (!res && vect_slp_convert_to_external (vinfo, node, node_instance))
> @@ -12441,3 +12478,148 @@ vect_schedule_slp (vec_info *vinfo, const
> vec<slp_instance> &slp_instances)
> }
> }
> }
> +
> +/* Record that a specific partial vector style could be used to vectorize
> + SLP_NODE if required. */
> +
> +static void
> +vect_slp_record_bb_style (slp_tree slp_node, vect_partial_vector_style style)
> +{
> + gcc_assert (style != vect_partial_vectors_none);
> + gcc_assert (style != vect_partial_vectors_avx512);
> +
> + if (SLP_TREE_PARTIAL_VECTORS_STYLE (slp_node) == vect_partial_vectors_none)
> + SLP_TREE_PARTIAL_VECTORS_STYLE (slp_node) = style;
> + else
> + gcc_assert (SLP_TREE_PARTIAL_VECTORS_STYLE (slp_node) == style);
> +}
> +
> +/* Record that a complete set of masks associated with SLP_NODE would need to
> + contain a sequence of NVECTORS masks that each control a vector of type
> + VECTYPE. If SCALAR_MASK is nonnull, the fully-masked loop would AND
> + these vector masks with the vector version of SCALAR_MASK. */
> +void
> +vect_slp_record_bb_mask (slp_tree slp_node, unsigned int /* nvectors */,
> + tree /* vectype */, tree /* scalar_mask */)
> +{
> + vect_slp_record_bb_style (slp_node, vect_partial_vectors_while_ult);
> +
> + /* FORNOW: this often overestimates the number of masks for costing
> purposes
> + because, after lowering, masks have often been eliminated, shared
> between
> + SLP nodes, or even shared between SLP subgraphs. */
> + SLP_TREE_NUM_PARTIAL_VECTORS(slp_node) ++;
> +}
> +
> +/* Materialize mask number INDEX for a group of scalar stmts in SLP_NODE that
> + operate on NVECTORS vectors of type VECTYPE, where 0 <= INDEX < NVECTORS.
> + Insert any set-up statements before GSI. */
> +
> +tree
> +vect_slp_get_bb_mask (slp_tree slp_node, gimple_stmt_iterator *gsi,
> + unsigned int nvectors, tree vectype, unsigned int index)
> +{
> + gcc_assert (SLP_TREE_PARTIAL_VECTORS_STYLE (slp_node)
> + == vect_partial_vectors_while_ult);
> + gcc_assert (nvectors >= 1);
> + gcc_assert (index < nvectors);
> +
> + const poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype);
> + const unsigned int group_size = SLP_TREE_LANES (slp_node);
> + unsigned int mask_size = group_size;
> + const tree masktype = truth_type_for (vectype);
> +
> + if (nunits.is_constant ())
> + {
> + /* Only the last vector can be a partial vector. */
> + if (index + 1 < nvectors)
> + return build_minus_one_cst (masktype);
> +
> + /* Return a mask for a possibly-partial tail vector. */
> + const unsigned int const_nunits = nunits.to_constant ();
> + const unsigned int head_size = (nvectors - 1) * const_nunits;
> + gcc_assert (head_size <= group_size);
> + mask_size = group_size - head_size;
> +
> + if (mask_size == const_nunits)
> + return build_minus_one_cst (masktype);
> + }
> + else
> + {
> + /* Return a mask for a single variable-length vector. */
> + gcc_assert (nvectors == 1);
> + gcc_assert (known_le (mask_size, nunits));
> + }
> +
> + /* FORNOW: don't bother maintaining a set of mask constants to allow
> + sharing between nodes belonging to the same instance of bb_vec_info
> + or even within the same SLP subgraph. */
> + gimple_seq stmts = NULL;
> + const tree cmp_type = size_type_node;
> + const tree start_index = build_zero_cst (cmp_type);
> + const tree end_index = build_int_cst (cmp_type, mask_size);
> + const tree mask = make_temp_ssa_name (masktype, NULL, "slp_mask");
> + vect_gen_while_ssa_name (&stmts, masktype, start_index, end_index, mask);
Not a review, I've encountered an ICE when trying to compile with x86 avx512
./gcc/xgcc -B ./gcc -O3 -march=sapphirerapids slp_pred_1.c -S
during GIMPLE pass: slp
slp_pred_1.c: In function ‘f’:
slp_pred_1.c:11:1: internal compiler error: in
vect_gen_while_ssa_name, at tree-vect-stmts.cc:14883
11 | f (uint8_t *x)
| ^
0x26038eb internal_error(char const*, ...)
../../slp_pred_tail/gcc/diagnostic-global-context.cc:787
0x9e8768 fancy_abort(char const*, int, char const*)
../../slp_pred_tail/gcc/diagnostics/context.cc:1813
0x8dca22 vect_gen_while_ssa_name(gimple**, tree_node*, tree_node*,
tree_node*, tree_node*)
../../slp_pred_tail/gcc/tree-vect-stmts.cc:14883
0x14f182a vect_slp_get_bb_mask(_slp_tree*, gimple_stmt_iterator*,
unsigned int, tree_node*, unsigned int)
../../slp_pred_tail/gcc/tree-vect-slp.cc:12688
0x149cab7 vectorizable_load
../../slp_pred_tail/gcc/tree-vect-stmts.cc:11522
0x14ad760 vect_transform_stmt(vec_info*, _stmt_vec_info*,
gimple_stmt_iterator*, _slp_tree*, _slp_instance*)
../../slp_pred_tail/gcc/tree-vect-stmts.cc:13581
0x14eee89 vect_schedule_slp_node
../../slp_pred_tail/gcc/tree-vect-slp.cc:12171
0x15123d1 vect_schedule_slp_node
../../slp_pred_tail/gcc/tree-vect-slp.cc:11940
0x15123d1 vect_schedule_scc
../../slp_pred_tail/gcc/tree-vect-slp.cc:12418
0x151236a vect_schedule_scc
../../slp_pred_tail/gcc/tree-vect-slp.cc:12399
0x151236a vect_schedule_scc
../../slp_pred_tail/gcc/tree-vect-slp.cc:12399
0x1512a49 vect_schedule_slp(vec_info*, vec<_slp_instance*, va_heap,
vl_ptr> const&)
../../slp_pred_tail/gcc/tree-vect-slp.cc:12563
0x15145af vect_slp_region
../../slp_pred_tail/gcc/tree-vect-slp.cc:10445
0x151640b vect_slp_bbs
../../slp_pred_tail/gcc/tree-vect-slp.cc:10557
0x15169b4 vect_slp_function(function*)
../../slp_pred_tail/gcc/tree-vect-slp.cc:10679
0x1521ad2 execute
../../slp_pred_tail/gcc/tree-vectorizer.cc:1570
It materializes BB-SLP tail masks with WHILE_ULT, which x86 doesn’t support.
After manually using a constant mask for avx512, I encountered another
performance issue.
if I change slp_pred_1.c to
void
f (uint8_t *x)
{
x[0] += 1;
x[1] += 2;
x[2] += 1;
x[3] += 2;
x[4] += 1;
x[5] += 2;
x[6] += 1;
x[7] += 2;
x[8] += 1;
x[9] += 2;
x[10] += 1;
x[11] += 2;
x[12] += 1;
x[13] += 2;
x[14] += 1;
x[15] += 4;
}
with -march=sapphirerapids -O3, it generates
<bb 2> [local count: 1073741824]:
vectp.4_51 = x_34(D);
vect__1.5_52 = .MASK_LOAD (vectp.4_51, 8B, { -1, -1, -1, -1, -1, -1,
-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 0, 0, 0, 0, 0,
, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }, { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0 });
vect__2.6_53 = vect__1.5_52 + { 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2,
1, 2, 1, 4, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1,
4 };
_1 = *x_34(D);
But a 128-bit vector w/o mask should be used here instead of using
256-bit vector + mask off upper 128-bit.
<bb 2> [local count: 1073741824]:
vectp.4_51 = x_34(D);
vect__1.5_52 = MEM <vector(16) unsigned char> [(uint8_t *)vectp.4_51];
vect__2.6_53 = vect__1.5_52 + { 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2,
1, 2, 1, 4 };
_1 = *x_34(D);
_2 = _1 + 1;
_3 = MEM[(uint8_t *)x_34(D) + 1B];
Similarly, for original slp-pred-1.c, a 128-bit vector should be used
with a mask instead of 256-bit vector.
> + gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);
> + return mask;
> +}
> +
> +/* Record that a complete set of lengths associated with SLP_NODE would need
> to
> + contain a sequence of NVECTORS lengths for controlling an operation on
> + VECTYPE. The operation splits each element of VECTYPE into FACTOR
> separate
> + subelements, measuring the length as a number of these subelements. */
> +
> +void
> +vect_slp_record_bb_len (slp_tree slp_node, unsigned int /* nvectors */,
> + tree /* vectype */, unsigned int /* factor */)
> +{
> + vect_slp_record_bb_style (slp_node, vect_partial_vectors_len);
> +
> + /* FORNOW: this probably overestimates the number of lengths for costing
> + purposes because, after lowering, lengths might have been eliminated,
> + shared between SLP nodes, or even shared between SLP subgraphs. */
> + SLP_TREE_NUM_PARTIAL_VECTORS (slp_node)++;
> +}
> +
> +/* Materialize length number INDEX for a group of scalar stmts in SLP_NODE
> that
> + operate on NVECTORS vectors of type VECTYPE, where 0 <= INDEX < NVECTORS.
> + Return a value that contains FACTOR multiplied by the number of elements
> that
> + should be processed. */
> +
> +tree
> +vect_slp_get_bb_len (slp_tree slp_node, unsigned int nvectors, tree vectype,
> + unsigned int index, unsigned int factor, bool adjusted)
> +{
> + gcc_checking_assert (SLP_TREE_PARTIAL_VECTORS_STYLE (slp_node)
> + == vect_partial_vectors_len);
> + gcc_assert (nvectors >= 1);
> + gcc_assert (index < nvectors);
> + (void) adjusted;
> +
> + const poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype);
> + const unsigned int group_size = SLP_TREE_LANES (slp_node);
> + unsigned int len = group_size;
> +
> + if (nunits.is_constant ())
> + {
> + const unsigned int const_nunits = nunits.to_constant ();
> +
> + /* Only the last vector can be a partial vector. */
> + if (index + 1 < nvectors)
> + len = const_nunits;
> + else
> + {
> + /* Return a length for a possibly-partial tail vector. */
> + const unsigned int head_size = (nvectors - 1) * const_nunits;
> + gcc_assert (head_size <= group_size);
> + len = group_size - head_size;
> + }
> + }
> + else
> + {
> + /* Return a length for a single variable-length vector. */
> + gcc_assert (nvectors == 1);
> + gcc_assert (known_le (len, nunits));
> + }
> +
> + return size_int (len * factor);
> +}
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index 15fca17a407..ecad74e7cbf 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -1385,7 +1385,9 @@ vectorizable_internal_function (combined_fn cfn, tree
> fndecl,
> /* Record that a complete set of masks associated with VINFO would need to
> contain a sequence of NVECTORS masks that each control a vector of type
> VECTYPE. If SCALAR_MASK is nonnull, the fully-masked loop would AND
> - these vector masks with the vector version of SCALAR_MASK. */
> + these vector masks with the vector version of SCALAR_MASK. Alternatively,
> + if doing basic block vectorization, record that a mask could be used to
> + vectorize SLP_NODE if required. */
> static void
> vect_record_mask (vec_info *vinfo, slp_tree slp_node, unsigned int nvectors,
> tree vectype, tree scalar_mask)
> @@ -1395,7 +1397,7 @@ vect_record_mask (vec_info *vinfo, slp_tree slp_node,
> unsigned int nvectors,
> vect_record_loop_mask (loop_vinfo, &LOOP_VINFO_MASKS (loop_vinfo),
> nvectors,
> vectype, scalar_mask);
> else
> - (void) slp_node; /* FORNOW */
> + vect_slp_record_bb_mask (slp_node, nvectors, vectype, scalar_mask);
> }
>
> /* Given a complete set of masks associated with VINFO, extract mask number
> @@ -1413,16 +1415,15 @@ vect_get_mask (vec_info *vinfo, slp_tree slp_node,
> gimple_stmt_iterator *gsi,
> return vect_get_loop_mask (loop_vinfo, gsi, &LOOP_VINFO_MASKS
> (loop_vinfo),
> nvectors, vectype, index);
> else
> - {
> - (void) slp_node; /* FORNOW */
> - return NULL_TREE;
> - }
> + return vect_slp_get_bb_mask (slp_node, gsi, nvectors, vectype, index);
> }
>
> /* Record that a complete set of lengths associated with VINFO would need to
> contain a sequence of NVECTORS lengths for controlling an operation on
> VECTYPE. The operation splits each element of VECTYPE into FACTOR
> separate
> - subelements, measuring the length as a number of these subelements. */
> + subelements, measuring the length as a number of these subelements.
> + Alternatively, if doing basic block vectorization, record that a length
> limit
> + could be used to vectorize SLP_NODE if required. */
> static void
> vect_record_len (vec_info *vinfo, slp_tree slp_node, unsigned int nvectors,
> tree vectype, unsigned int factor)
> @@ -1432,7 +1433,7 @@ vect_record_len (vec_info *vinfo, slp_tree slp_node,
> unsigned int nvectors,
> vect_record_loop_len (loop_vinfo, &LOOP_VINFO_LENS (loop_vinfo),
> nvectors,
> vectype, factor);
> else
> - (void) slp_node; /* FORNOW */
> + vect_slp_record_bb_len (slp_node, nvectors, vectype, factor);
> }
>
> /* Given a complete set of lengths associated with VINFO, extract length
> number
> @@ -1453,10 +1454,8 @@ vect_get_len (vec_info *vinfo, slp_tree slp_node,
> gimple_stmt_iterator *gsi,
> return vect_get_loop_len (loop_vinfo, gsi, &LOOP_VINFO_LENS (loop_vinfo),
> nvectors, vectype, index, factor, adjusted);
> else
> - {
> - (void) slp_node; /* FORNOW */
> - return NULL_TREE;
> - }
> + return vect_slp_get_bb_len (slp_node, nvectors, vectype, index, factor,
> + adjusted);
> }
>
> static tree permute_vec_elements (vec_info *, tree, tree, tree,
> stmt_vec_info,
> @@ -14710,24 +14709,35 @@ supportable_indirect_convert_operation (code_helper
> code,
> mask[I] is true iff J + START_INDEX < END_INDEX for all J <= I.
> Add the statements to SEQ. */
>
> +void
> +vect_gen_while_ssa_name (gimple_seq *seq, tree mask_type, tree start_index,
> + tree end_index, tree ssa_name)
> +{
> + tree cmp_type = TREE_TYPE (start_index);
> + gcc_checking_assert (direct_internal_fn_supported_p (IFN_WHILE_ULT,
> cmp_type,
> + mask_type,
> + OPTIMIZE_FOR_SPEED));
> + gcall *call
> + = gimple_build_call_internal (IFN_WHILE_ULT, 3, start_index, end_index,
> + build_zero_cst (mask_type));
> + gimple_call_set_lhs (call, ssa_name);
> + gimple_seq_add_stmt (seq, call);
> +}
> +
> +/* Like vect_gen_while_ssa_name except that it creates a new SSA_NAME node
> + for type MASK_TYPE defined in the created GIMPLE_CALL statement. If NAME
> + is not a null pointer then it is used for the SSA_NAME in dumps. */
> +
> tree
> vect_gen_while (gimple_seq *seq, tree mask_type, tree start_index,
> tree end_index, const char *name)
> {
> - tree cmp_type = TREE_TYPE (start_index);
> - gcc_checking_assert (direct_internal_fn_supported_p (IFN_WHILE_ULT,
> - cmp_type, mask_type,
> - OPTIMIZE_FOR_SPEED));
> - gcall *call = gimple_build_call_internal (IFN_WHILE_ULT, 3,
> - start_index, end_index,
> - build_zero_cst (mask_type));
> tree tmp;
> if (name)
> tmp = make_temp_ssa_name (mask_type, NULL, name);
> else
> tmp = make_ssa_name (mask_type);
> - gimple_call_set_lhs (call, tmp);
> - gimple_seq_add_stmt (seq, call);
> + vect_gen_while_ssa_name (seq, mask_type, start_index, end_index, tmp);
> return tmp;
> }
>
> diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
> index a3855568b09..f79f04ff8ac 100644
> --- a/gcc/tree-vectorizer.h
> +++ b/gcc/tree-vectorizer.h
> @@ -312,6 +312,13 @@ struct vect_load_store_data : vect_data {
> bool subchain_p; // VMAT_STRIDED_SLP and VMAT_GATHER_SCATTER
> };
>
> +enum vect_partial_vector_style {
> + vect_partial_vectors_none,
> + vect_partial_vectors_while_ult,
> + vect_partial_vectors_avx512,
> + vect_partial_vectors_len
> +};
> +
> /* A computation tree of an SLP instance. Each node corresponds to a group
> of
> stmts to be packed in a SIMD stmt. */
> struct _slp_tree {
> @@ -377,7 +384,16 @@ struct _slp_tree {
> /* For BB vect, flag to indicate this load node should be vectorized
> as to avoid STLF fails because of related stores. */
> bool avoid_stlf_fail;
> -
> + /* The style used for implementing partial vectors if LANES is less than
> + the minimum number of lanes implied by the VECTYPE. */
> + vect_partial_vector_style partial_vector_style;
> + /* Flag to indicate whether we still have the option of vectorizing this
> node
> + using partial vectors (i.e. using lengths or masks to prevent use of
> + inactive scalar lanes). */
> + bool can_use_partial_vectors;
> + /* Number of partial vectors, for costing purposes. Should be 0 unless a
> + partial vector style has been set. */
> + int num_partial_vectors;
> int vertex;
>
> /* The kind of operation as determined by analysis and optional
> @@ -476,6 +492,9 @@ public:
> #define SLP_TREE_GS_BASE(S) (S)->gs_base
> #define SLP_TREE_REDUC_IDX(S) (S)->cycle_info.reduc_idx
> #define SLP_TREE_PERMUTE_P(S) ((S)->code == VEC_PERM_EXPR)
> +#define SLP_TREE_PARTIAL_VECTORS_STYLE(S) (S)->partial_vector_style
> +#define SLP_TREE_CAN_USE_PARTIAL_VECTORS_P(S) (S)->can_use_partial_vectors
> +#define SLP_TREE_NUM_PARTIAL_VECTORS(S)
> (S)->num_partial_vectors
>
> inline vect_memory_access_type
> SLP_TREE_MEMORY_ACCESS_TYPE (slp_tree node)
> @@ -486,13 +505,6 @@ SLP_TREE_MEMORY_ACCESS_TYPE (slp_tree node)
> return VMAT_UNINITIALIZED;
> }
>
> -enum vect_partial_vector_style {
> - vect_partial_vectors_none,
> - vect_partial_vectors_while_ult,
> - vect_partial_vectors_avx512,
> - vect_partial_vectors_len
> -};
> -
> /* Key for map that records association between
> scalar conditions and corresponding loop mask, and
> is populated by vect_record_loop_mask. */
> @@ -2607,6 +2619,7 @@ extern tree vect_gen_perm_mask_checked (tree, const
> vec_perm_indices &);
> extern void optimize_mask_stores (class loop*);
> extern tree vect_gen_while (gimple_seq *, tree, tree, tree,
> const char * = nullptr);
> +extern void vect_gen_while_ssa_name (gimple_seq *, tree, tree, tree, tree);
> extern tree vect_gen_while_not (gimple_seq *, tree, tree, tree);
> extern opt_result vect_get_vector_types_for_stmt (vec_info *,
> stmt_vec_info, tree *,
> @@ -2788,7 +2801,14 @@ extern slp_tree vect_create_new_slp_node (unsigned,
> tree_code);
> extern void vect_free_slp_tree (slp_tree);
> extern bool compatible_calls_p (gcall *, gcall *, bool);
> extern int vect_slp_child_index_for_operand (const stmt_vec_info, int op);
> -
> +extern void vect_slp_record_bb_mask (slp_tree slp_node, unsigned int
> nvectors,
> + tree vectype, tree scalar_mask);
> +extern tree vect_slp_get_bb_mask (slp_tree, gimple_stmt_iterator *,
> + unsigned int, tree, unsigned int);
> +extern void vect_slp_record_bb_len (slp_tree slp_node, unsigned int nvectors,
> + tree vectype, unsigned int factor);
> +extern tree vect_slp_get_bb_len (slp_tree, unsigned int, tree, unsigned int,
> + unsigned int, bool);
> extern tree prepare_vec_mask (vec_info *, tree, tree, tree,
> gimple_stmt_iterator *);
> extern tree vect_get_mask_load_else (int, tree);
> @@ -2953,7 +2973,7 @@ vect_cannot_use_partial_vectors (vec_info *vinfo,
> slp_tree slp_node)
> if (loop_vinfo)
> LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false;
> else
> - (void) slp_node; /* FORNOW */
> + SLP_TREE_CAN_USE_PARTIAL_VECTORS_P (slp_node) = false;
> }
>
> /* Return true if VINFO is vectorizer state for loop vectorization, we've
> @@ -2967,10 +2987,8 @@ vect_fully_with_length_p (vec_info *vinfo, slp_tree
> slp_node)
> if (loop_vec_info loop_vinfo = dyn_cast<loop_vec_info> (vinfo))
> return LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo);
> else
> - {
> - (void) slp_node; /* FORNOW */
> - return false;
> - }
> + return SLP_TREE_PARTIAL_VECTORS_STYLE (slp_node)
> + == vect_partial_vectors_len;
> }
>
> /* Return true if VINFO is vectorizer state for loop vectorization, we've
> @@ -2984,10 +3002,8 @@ vect_fully_masked_p (vec_info *vinfo, slp_tree
> slp_node)
> if (loop_vec_info loop_vinfo = dyn_cast<loop_vec_info> (vinfo))
> return LOOP_VINFO_FULLY_MASKED_P (loop_vinfo);
> else
> - {
> - (void) slp_node; /* FORNOW */
> - return false;
> - }
> + return SLP_TREE_PARTIAL_VECTORS_STYLE (slp_node)
> + == vect_partial_vectors_while_ult;
> }
>
> /* If STMT_INFO describes a reduction, return the vect_reduction_type
> --
> 2.43.0
>
--
BR,
Hongtao