On Mon, Nov 24, 2025 at 8:01 PM Christopher Bazley <[email protected]> wrote: > > To decide whether to create a new SLP instance for BB SLP, > vect_analyze_slp_instance will need the minimum number of lanes > in the SLP tree, which must not be less than the group size > (otherwise "unrolling" is required). All usage of max_nunits > is therefore replaced with a new class that encapsulates > both minimum and maximum. > > For now, the minimum value is unused.
Tracking minimum and maximum nunits on each SLP node is overkill
and the way we accumulate those to a global minimum/maximum on
an SLP graph entry is not suitable to compute "unrolling" or a split
point on the entry node group-size. I'd like to see us do away with
tracking nunits at all.
In fact the maybe_ne (unrolling_factor, 1U) code in vect_analyze_slp_instance
and vect_build_slp_instance is currently unreachable due to the check
already present in vect_record_max_nunits.
The only other use in vect_update_slp_vf_for_node should be replaced by
the local SLP_TREE_VECTYPE, like with the attached patch. I've put
this to do "later" for some time now because as long as we had both SLP
and non-SLP the vect_maybe_update_slp_op_vectype code is not
really taking advantage of such local decisions and it gets it "wrong" for
invariants for example in vectorizable_conversion, leading to ICEs
in gcc.dg/vect/O3-pr87546.c and gcc.dg/vect/O3-vect-pr32243.c.
This could be mitigated locally in vect_update_slp_vf_for_node,
but the real fix is to not require extra unrolling because of constant/invariant
nodes.
That said, I really want to get rid of max_nunits, not add to it. Instead
for the purpose of splitting and validating vector type validity for BB
vectorization I'd resort to a walk over the graph like
vect_update_slp_vf_for_node
does, deciding on splitting and predication. This change to track not only
maximum but minimum nunits goes in the wrong direction. For the
purpose of this patch I suggest to compute what you need for the "minimum"
by a new SLP graph walk instead.
I have attached the patch that should ideally work (but doesn't for the reason
pointed out above). It would make max_nunits unused.
Richard.
> gcc/ChangeLog:
>
> * tree-vect-slp.cc (_slp_tree::_slp_tree): Initialize the lower and
> upper bounds for an SLP tree node instead of only the upper bound.
> The lower bound is UINT64_MAX to allow it to grow upwards.
> (vect_record_max_nunits): Renamed as vect_record_nunits.
> (vect_record_nunits): Change parameter type from poly_uint64 * to
> slp_tree_nunits * and rename from max_nunits to nunits.
> Call vect_update_nunits instead of vect_update_max_nunits.
> (vect_build_slp_tree_1): Change parameter type from poly_uint64 * to
> slp_tree_nunits * and rename from max_nunits to nunits.
> Update for renaming of the vect_record_max_nunits function.
> (vect_build_slp_tree_2): Change parameter type from poly_uint64 * to
> slp_tree_nunits * and rename from max_nunits to nunits.
> Substitute local variable this_nunits of type slp_tree_nunits for
> this_max_nunits of type poly_uint64.
> Update for renaming of the vect_record_max_nunits function and the
> max_nunits member of _slp_tree.
> (vect_build_slp_tree): Change parameter type from poly_uint64 * to
> slp_tree_nunits * and rename from max_nunits to nunits.
> Update for renaming of the max_nunits member of _slp_tree.
> Substitute local variable this_nunits of type slp_tree_nunits for
> this_max_nunits of type poly_uint64.
> Rely on the bounds being initialized to the default member values.
> Call vect_update_nunits instead of vect_update_max_nunits.
> (vect_print_slp_tree): Dump nunits.min and nunits.max of the
> _slp_tree instead of the max_nunits member they replace.
> (calculate_unrolling_factor): Update parameter type from poly_uint64
> to slp_tree_nunits. Use the nunits.max member.
> (optimize_load_redistribution_1): Substitute local variable nunits of
> type slp_tree_nunits for max_nunits of type poly_uint64.
> Rely on the bounds being initialized to the default member values.
> (vect_build_slp_store_interleaving): Update parameter type from
> poly_uint64 to slp_tree_nunits and rename from max_nunits to nunits.
> Update for renaming of the max_nunits member of _slp_tree.
> (vect_build_slp_instance): Substitute local variable nunits of type
> slp_tree_nunits for max_nunits of type poly_uint64.
> Rely on the bounds being initialized to the default member values.
> (vect_analyze_slp_reduc_chain): As above.
> (vect_analyze_slp_reduction): As above.
> (vect_analyze_slp_instance): Substitute local variable nunits of type
> slp_tree_nunits for max_nunits of type poly_uint64.
> Rely on the bounds being initialized to the default member values but
> also explicitly reinitialize bounds to their defaults before
> each invocation of vect_build_slp_tree when trying to break a group
> into pieces.
> Improve a diagnostic message printed when this function fails.
> Call vect_update_nunits instead of vect_update_max_nunits.
> (vect_lower_load_permutations): Substitute local variable nunits of
> type
> slp_tree_nunits for max_nunits of type poly_uint64.
> Rely on the bounds being initialized to the default member values.
> (vect_update_slp_vf_for_node): Update for renaming of the max_nunits
> member of _slp_tree.
> * tree-vectorizer.h (struct slp_tree_nunits): New type definition
> to represent the minimum and maximum number of vector elements for
> a subtree.
> (struct _slp_tree): Replace the max_nunits member of type poly_uint64
> with nunits of type slp_tree_nunits
> (vect_update_nunits): New function to update the range stored in one
> instance of slp_tree_nunits so that it becomes a superset of the range
> stored in another.
> Call vect_update_max_nunits internally so calls to the new function
> can be substituted for calls to the existing function.
> On return from vect_update_max_nunits, reduce the minimum bound if
> applicable. Both minima are compared explicitly against UINT64_MAX
> (the initial value, meaning 'empty') to avoid invalid use of
> ordered_min. (UINT64_MAX is huge but not polynomial.)
>
> ---
> gcc/tree-vect-slp.cc | 162 +++++++++++++++++++++---------------------
> gcc/tree-vectorizer.h | 46 +++++++++++-
> 2 files changed, 125 insertions(+), 83 deletions(-)
>
> diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> index 0ab15fde469..2369319b6ea 100644
> --- a/gcc/tree-vect-slp.cc
> +++ b/gcc/tree-vect-slp.cc
> @@ -130,7 +130,7 @@ _slp_tree::_slp_tree ()
> this->cycle_info.reduc_idx = -1;
> SLP_TREE_REF_COUNT (this) = 1;
> this->failed = NULL;
> - this->max_nunits = 1;
> + this->nunits = {UINT64_MAX, 1};
> this->lanes = 0;
> SLP_TREE_TYPE (this) = undef_vec_info_type;
> this->data = NULL;
> @@ -1051,14 +1051,14 @@ compatible_calls_p (gcall *call1, gcall *call2, bool
> allow_two_operators)
> /* A subroutine of vect_build_slp_tree for checking VECTYPE, which is the
> caller's attempt to find the vector type in STMT_INFO with the narrowest
> element type. Return true if VECTYPE is nonnull and if it is valid
> - for STMT_INFO. When returning true, update MAX_NUNITS to reflect the
> - number of units in VECTYPE. GROUP_SIZE and MAX_NUNITS are as for
> + for STMT_INFO. When returning true, update *NUNITS to reflect the
> + number of units in VECTYPE. GROUP_SIZE and NUNITS are as for
> vect_build_slp_tree. */
>
> static bool
> -vect_record_max_nunits (vec_info *vinfo, stmt_vec_info stmt_info,
> - unsigned int group_size,
> - tree vectype, poly_uint64 *max_nunits)
> +vect_record_nunits (vec_info *vinfo, stmt_vec_info stmt_info,
> + unsigned int group_size, tree vectype,
> + slp_tree_nunits *nunits)
> {
> if (!vectype)
> {
> @@ -1071,7 +1071,7 @@ vect_record_max_nunits (vec_info *vinfo, stmt_vec_info
> stmt_info,
> }
>
> /* If populating the vector type requires unrolling then fail
> - before adjusting *max_nunits for basic-block vectorization. */
> + before adjusting *nunits for basic-block vectorization. */
> if (is_a <bb_vec_info> (vinfo)
> && !multiple_p (group_size, TYPE_VECTOR_SUBPARTS (vectype)))
> {
> @@ -1084,7 +1084,7 @@ vect_record_max_nunits (vec_info *vinfo, stmt_vec_info
> stmt_info,
> }
>
> /* In case of multiple types we need to detect the smallest type. */
> - vect_update_max_nunits (max_nunits, vectype);
> + vect_update_nunits (nunits, vectype);
> return true;
> }
>
> @@ -1105,7 +1105,7 @@ vect_record_max_nunits (vec_info *vinfo, stmt_vec_info
> stmt_info,
> static bool
> vect_build_slp_tree_1 (vec_info *vinfo, unsigned char *swap,
> vec<stmt_vec_info> stmts, unsigned int group_size,
> - poly_uint64 *max_nunits, bool *matches,
> + slp_tree_nunits *nunits, bool *matches,
> bool *two_operators, tree *node_vectype)
> {
> unsigned int i;
> @@ -1145,8 +1145,8 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char
> *swap,
> as if nunits was not an issue. This allows splitting of groups
> to happen. */
> if (nunits_vectype
> - && !vect_record_max_nunits (vinfo, first_stmt_info, group_size,
> - nunits_vectype, max_nunits))
> + && !vect_record_nunits (vinfo, first_stmt_info, group_size,
> + nunits_vectype, nunits))
> {
> gcc_assert (is_a <bb_vec_info> (vinfo));
> maybe_soft_fail = true;
> @@ -1828,14 +1828,14 @@ vect_slp_linearize_chain (vec_info *vinfo,
> static slp_tree
> vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node,
> vec<stmt_vec_info> stmts, unsigned int group_size,
> - poly_uint64 *max_nunits,
> + slp_tree_nunits *nunits,
> bool *matches, unsigned *limit, unsigned *tree_size,
> scalar_stmts_to_slp_tree_map_t *bst_map);
>
> static slp_tree
> vect_build_slp_tree (vec_info *vinfo,
> vec<stmt_vec_info> stmts, unsigned int group_size,
> - poly_uint64 *max_nunits,
> + slp_tree_nunits *nunits,
> bool *matches, unsigned *limit, unsigned *tree_size,
> scalar_stmts_to_slp_tree_map_t *bst_map)
> {
> @@ -1848,7 +1848,7 @@ vect_build_slp_tree (vec_info *vinfo,
> if (!(*leader)->failed)
> {
> SLP_TREE_REF_COUNT (*leader)++;
> - vect_update_max_nunits (max_nunits, (*leader)->max_nunits);
> + vect_update_nunits (nunits, (*leader)->nunits);
> stmts.release ();
> return *leader;
> }
> @@ -1882,9 +1882,9 @@ vect_build_slp_tree (vec_info *vinfo,
> dump_printf_loc (MSG_NOTE, vect_location,
> "starting SLP discovery for node %p\n", (void *) res);
>
> - poly_uint64 this_max_nunits = 1;
> + slp_tree_nunits this_nunits{};
> slp_tree res_ = vect_build_slp_tree_2 (vinfo, res, stmts, group_size,
> - &this_max_nunits,
> + &this_nunits,
> matches, limit, tree_size, bst_map);
> if (!res_)
> {
> @@ -1913,8 +1913,8 @@ vect_build_slp_tree (vec_info *vinfo,
> "SLP discovery for node %p succeeded\n",
> (void *) res);
> gcc_assert (res_ == res);
> - res->max_nunits = this_max_nunits;
> - vect_update_max_nunits (max_nunits, this_max_nunits);
> + res->nunits = this_nunits;
> + vect_update_nunits (nunits, this_nunits);
> /* Keep a reference for the bst_map use. */
> SLP_TREE_REF_COUNT (res)++;
> }
> @@ -1972,12 +1972,12 @@ vect_slp_build_two_operator_nodes (slp_tree perm,
> tree vectype,
> static slp_tree
> vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node,
> vec<stmt_vec_info> stmts, unsigned int group_size,
> - poly_uint64 *max_nunits,
> + slp_tree_nunits *nunits,
> bool *matches, unsigned *limit, unsigned *tree_size,
> scalar_stmts_to_slp_tree_map_t *bst_map)
> {
> unsigned nops, i, this_tree_size = 0;
> - poly_uint64 this_max_nunits = *max_nunits;
> + slp_tree_nunits this_nunits = *nunits;
>
> matches[0] = false;
>
> @@ -2003,8 +2003,7 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node,
> tree scalar_type = TREE_TYPE (PHI_RESULT (stmt));
> tree vectype = get_vectype_for_scalar_type (vinfo, scalar_type,
> group_size);
> - if (!vect_record_max_nunits (vinfo, stmt_info, group_size, vectype,
> - max_nunits))
> + if (!vect_record_nunits (vinfo, stmt_info, group_size, vectype,
> nunits))
> return NULL;
>
> vect_def_type def_type = STMT_VINFO_DEF_TYPE (stmt_info);
> @@ -2057,7 +2056,7 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node,
> unsigned char *swap = XALLOCAVEC (unsigned char, group_size);
> tree vectype = NULL_TREE;
> if (!vect_build_slp_tree_1 (vinfo, swap, stmts, group_size,
> - &this_max_nunits, matches, &two_operators,
> + &this_nunits, matches, &two_operators,
> &vectype))
> return NULL;
>
> @@ -2069,7 +2068,7 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node,
> gcc_assert (DR_IS_READ (STMT_VINFO_DATA_REF (stmt_info)));
> else
> {
> - *max_nunits = this_max_nunits;
> + *nunits = this_nunits;
> (*tree_size)++;
> node = vect_create_new_slp_node (node, stmts, 0);
> SLP_TREE_VECTYPE (node) = vectype;
> @@ -2154,7 +2153,7 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node,
> bool *matches2 = XALLOCAVEC (bool, dr_group_size);
> slp_tree unperm_load
> = vect_build_slp_tree (vinfo, stmts2, dr_group_size,
> - &this_max_nunits, matches2, limit,
> + &this_nunits, matches2, limit,
> &this_tree_size, bst_map);
> /* When we are able to do the full masked load emit that
> followed by 'node' being the desired final permutation.
> */
> @@ -2457,7 +2456,7 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node,
> else
> op_stmts.quick_push (NULL);
> child = vect_build_slp_tree (vinfo, op_stmts,
> - group_size,
> &this_max_nunits,
> + group_size, &this_nunits,
> matches, limit,
> &this_tree_size, bst_map);
> /* ??? We're likely getting too many fatal mismatches
> @@ -2613,7 +2612,7 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node,
> children[i] = child;
> }
> *tree_size += this_tree_size + 1;
> - *max_nunits = this_max_nunits;
> + *nunits = this_nunits;
> while (!chains.is_empty ())
> chains.pop ().release ();
> return node;
> @@ -2892,7 +2891,7 @@ out:
> def_stmts2.create (1);
> def_stmts2.quick_push (oprnd_info->def_stmts[0]);
> child = vect_build_slp_tree (vinfo, def_stmts2, 1,
> - &this_max_nunits,
> + &this_nunits,
> matches, limit,
> &this_tree_size, bst_map);
> if (child)
> @@ -2910,7 +2909,7 @@ out:
> .quick_push (std::make_pair (0u, 0u));
> }
> SLP_TREE_CHILDREN (pnode).quick_push (child);
> - pnode->max_nunits = child->max_nunits;
> + pnode->nunits = child->nunits;
> children.safe_push (pnode);
> oprnd_info->def_stmts = vNULL;
> continue;
> @@ -2920,7 +2919,7 @@ out:
> }
>
> if ((child = vect_build_slp_tree (vinfo, oprnd_info->def_stmts,
> - group_size, &this_max_nunits,
> + group_size, &this_nunits,
> matches, limit,
> &this_tree_size, bst_map)) != NULL)
> {
> @@ -3009,7 +3008,7 @@ out:
> /* And try again with scratch 'matches' ... */
> bool *tem = XALLOCAVEC (bool, group_size);
> if ((child = vect_build_slp_tree (vinfo, oprnd_info->def_stmts,
> - group_size, &this_max_nunits,
> + group_size, &this_nunits,
> tem, limit,
> &this_tree_size, bst_map)) !=
> NULL)
> {
> @@ -3115,7 +3114,7 @@ fail:
> }
>
> *tree_size += this_tree_size + 1;
> - *max_nunits = this_max_nunits;
> + *nunits = this_nunits;
>
> if (two_operators)
> {
> @@ -3261,16 +3260,15 @@ vect_print_slp_tree (dump_flags_t dump_kind,
> dump_location_t loc,
>
> dump_metadata_t metadata (dump_kind, loc.get_impl_location ());
> dump_user_location_t user_loc = loc.get_user_location ();
> - dump_printf_loc (metadata, user_loc,
> - "node%s %p (max_nunits=" HOST_WIDE_INT_PRINT_UNSIGNED
> - ", refcnt=%u)",
> - SLP_TREE_DEF_TYPE (node) == vect_external_def
> - ? " (external)"
> - : (SLP_TREE_DEF_TYPE (node) == vect_constant_def
> - ? " (constant)"
> - : ""), (void *) node,
> - estimated_poly_value (node->max_nunits),
> - SLP_TREE_REF_COUNT (node));
> + dump_printf_loc (
> + metadata, user_loc,
> + "node%s %p (nunits.min=" HOST_WIDE_INT_PRINT_UNSIGNED
> + ", nunits.max=" HOST_WIDE_INT_PRINT_UNSIGNED ", refcnt=%u)",
> + SLP_TREE_DEF_TYPE (node) == vect_external_def
> + ? " (external)"
> + : (SLP_TREE_DEF_TYPE (node) == vect_constant_def ? " (constant)" : ""),
> + (void *) node, estimated_poly_value (node->nunits.min),
> + estimated_poly_value (node->nunits.max), SLP_TREE_REF_COUNT (node));
> if (SLP_TREE_VECTYPE (node))
> dump_printf (metadata, " %T", SLP_TREE_VECTYPE (node));
> dump_printf (metadata, "%s",
> @@ -3637,9 +3635,9 @@ vect_split_slp_store_group (stmt_vec_info first_vinfo,
> unsigned group1_size)
> statements and a vector of NUNITS elements. */
>
> static poly_uint64
> -calculate_unrolling_factor (poly_uint64 nunits, unsigned int group_size)
> +calculate_unrolling_factor (slp_tree_nunits nunits, unsigned int group_size)
> {
> - return exact_div (common_multiple (nunits, group_size), group_size);
> + return exact_div (common_multiple (nunits.max, group_size), group_size);
> }
>
> /* Helper that checks to see if a node is a load node. */
> @@ -3701,9 +3699,9 @@ optimize_load_redistribution_1
> (scalar_stmts_to_slp_tree_map_t *bst_map,
> (void *) root);
>
> bool *matches = XALLOCAVEC (bool, group_size);
> - poly_uint64 max_nunits = 1;
> + slp_tree_nunits nunits{};
> unsigned tree_size = 0, limit = 1;
> - node = vect_build_slp_tree (vinfo, stmts, group_size, &max_nunits,
> + node = vect_build_slp_tree (vinfo, stmts, group_size, &nunits,
> matches, &limit, &tree_size, bst_map);
> if (!node)
> stmts.release ();
> @@ -3886,14 +3884,14 @@ vect_analyze_slp_instance (vec_info *vinfo,
> static slp_tree
> vect_build_slp_store_interleaving (vec<slp_tree> &rhs_nodes,
> vec<stmt_vec_info> &scalar_stmts,
> - poly_uint64 max_nunits)
> + slp_tree_nunits nunits)
> {
> unsigned int group_size = scalar_stmts.length ();
> slp_tree node = vect_create_new_slp_node (scalar_stmts,
> SLP_TREE_CHILDREN
> (rhs_nodes[0]).length ());
> SLP_TREE_VECTYPE (node) = SLP_TREE_VECTYPE (rhs_nodes[0]);
> - node->max_nunits = max_nunits;
> + node->nunits = nunits;
> for (unsigned l = 0;
> l < SLP_TREE_CHILDREN (rhs_nodes[0]).length (); ++l)
> {
> @@ -3903,7 +3901,7 @@ vect_build_slp_store_interleaving (vec<slp_tree>
> &rhs_nodes,
> SLP_TREE_CHILDREN (node).quick_push (perm);
> SLP_TREE_LANE_PERMUTATION (perm).create (group_size);
> SLP_TREE_VECTYPE (perm) = SLP_TREE_VECTYPE (node);
> - perm->max_nunits = max_nunits;
> + perm->nunits = nunits;
> SLP_TREE_LANES (perm) = group_size;
> /* ??? We should set this NULL but that's not expected. */
> SLP_TREE_REPRESENTATIVE (perm)
> @@ -3959,7 +3957,7 @@ vect_build_slp_store_interleaving (vec<slp_tree>
> &rhs_nodes,
> SLP_TREE_LANES (permab) = n;
> SLP_TREE_LANE_PERMUTATION (permab).create (n);
> SLP_TREE_VECTYPE (permab) = SLP_TREE_VECTYPE (perm);
> - permab->max_nunits = max_nunits;
> + permab->nunits = nunits;
> /* ??? Should be NULL but that's not expected. */
> SLP_TREE_REPRESENTATIVE (permab) = SLP_TREE_REPRESENTATIVE
> (perm);
> SLP_TREE_CHILDREN (permab).quick_push (a);
> @@ -4030,7 +4028,7 @@ vect_build_slp_store_interleaving (vec<slp_tree>
> &rhs_nodes,
> SLP_TREE_LANES (permab) = n;
> SLP_TREE_LANE_PERMUTATION (permab).create (n);
> SLP_TREE_VECTYPE (permab) = SLP_TREE_VECTYPE (perm);
> - permab->max_nunits = max_nunits;
> + permab->nunits = nunits;
> /* ??? Should be NULL but that's not expected. */
> SLP_TREE_REPRESENTATIVE (permab) = SLP_TREE_REPRESENTATIVE (perm);
> SLP_TREE_CHILDREN (permab).quick_push (a);
> @@ -4115,7 +4113,7 @@ vect_build_slp_instance (vec_info *vinfo,
> /* Build the tree for the SLP instance. */
> unsigned int group_size = scalar_stmts.length ();
> bool *matches = XALLOCAVEC (bool, group_size);
> - poly_uint64 max_nunits = 1;
> + slp_tree_nunits nunits{};
> unsigned tree_size = 0;
>
> slp_tree node = NULL;
> @@ -4126,19 +4124,19 @@ vect_build_slp_instance (vec_info *vinfo,
> }
> else
> node = vect_build_slp_tree (vinfo, scalar_stmts, group_size,
> - &max_nunits, matches, limit,
> + &nunits, matches, limit,
> &tree_size, bst_map);
> if (node != NULL)
> {
> /* Calculate the unrolling factor based on the smallest type. */
> poly_uint64 unrolling_factor
> - = calculate_unrolling_factor (max_nunits, group_size);
> + = calculate_unrolling_factor (nunits, group_size);
>
> if (maybe_ne (unrolling_factor, 1U)
> && is_a <bb_vec_info> (vinfo))
> {
> unsigned HOST_WIDE_INT const_max_nunits;
> - if (!max_nunits.is_constant (&const_max_nunits)
> + if (!nunits.max.is_constant (&const_max_nunits)
> || const_max_nunits > group_size)
> {
> if (dump_enabled_p ())
> @@ -4376,10 +4374,10 @@ vect_analyze_slp_reduc_chain (loop_vec_info vinfo,
>
> unsigned int group_size = scalar_stmts.length ();
> bool *matches = XALLOCAVEC (bool, group_size);
> - poly_uint64 max_nunits = 1;
> + slp_tree_nunits nunits{};
> unsigned tree_size = 0;
> slp_tree node = vect_build_slp_tree (vinfo, scalar_stmts, group_size,
> - &max_nunits, matches, limit,
> + &nunits, matches, limit,
> &tree_size, bst_map);
> if (!node)
> {
> @@ -4519,7 +4517,7 @@ vect_analyze_slp_reduc_chain (loop_vec_info vinfo,
> /* Build the tree for the SLP instance. */
> unsigned int group_size = scalar_stmts.length ();
> bool *matches = XALLOCAVEC (bool, group_size);
> - poly_uint64 max_nunits = 1;
> + slp_tree_nunits nunits{};
> unsigned tree_size = 0;
>
> /* ??? We need this only for SLP discovery. */
> @@ -4527,7 +4525,7 @@ vect_analyze_slp_reduc_chain (loop_vec_info vinfo,
> REDUC_GROUP_FIRST_ELEMENT (scalar_stmts[i]) = scalar_stmts[0];
>
> slp_tree node = vect_build_slp_tree (vinfo, scalar_stmts, group_size,
> - &max_nunits, matches, limit,
> + &nunits, matches, limit,
> &tree_size, bst_map);
>
> for (unsigned i = 0; i < scalar_stmts.length (); ++i)
> @@ -4669,11 +4667,11 @@ vect_analyze_slp_reduction (loop_vec_info vinfo,
> /* Build the tree for the SLP instance. */
> unsigned int group_size = scalar_stmts.length ();
> bool *matches = XALLOCAVEC (bool, group_size);
> - poly_uint64 max_nunits = 1;
> + slp_tree_nunits nunits{};
> unsigned tree_size = 0;
>
> slp_tree node = vect_build_slp_tree (vinfo, scalar_stmts, group_size,
> - &max_nunits, matches, limit,
> + &nunits, matches, limit,
> &tree_size, bst_map);
> if (node != NULL)
> {
> @@ -4738,11 +4736,11 @@ vect_analyze_slp_reduction_group (loop_vec_info
> loop_vinfo,
> unsigned int group_size = scalar_stmts.length ();
> if (!matches)
> matches = XALLOCAVEC (bool, group_size);
> - poly_uint64 max_nunits = 1;
> + slp_tree_nunits nunits{};
> unsigned tree_size = 0;
> slp_tree node = vect_build_slp_tree (loop_vinfo, scalar_stmts,
> group_size,
> - &max_nunits, matches, limit,
> + &nunits, matches, limit,
> &tree_size, bst_map);
> if (!node)
> return false;
> @@ -4955,7 +4953,7 @@ vect_analyze_slp_instance (vec_info *vinfo,
> /* Build the tree for the SLP instance. */
> unsigned int group_size = scalar_stmts.length ();
> bool *matches = XALLOCAVEC (bool, group_size);
> - poly_uint64 max_nunits = 1;
> + slp_tree_nunits nunits{};
> unsigned tree_size = 0;
> unsigned i;
>
> @@ -4967,26 +4965,28 @@ vect_analyze_slp_instance (vec_info *vinfo,
> }
> else
> node = vect_build_slp_tree (vinfo, scalar_stmts, group_size,
> - &max_nunits, matches, limit,
> + &nunits, matches, limit,
> &tree_size, bst_map);
> if (node != NULL)
> {
> /* Calculate the unrolling factor based on the smallest type. */
> poly_uint64 unrolling_factor
> - = calculate_unrolling_factor (max_nunits, group_size);
> + = calculate_unrolling_factor (nunits, group_size);
>
> if (maybe_ne (unrolling_factor, 1U)
> - && is_a <bb_vec_info> (vinfo))
> + && is_a<bb_vec_info> (vinfo))
> {
> unsigned HOST_WIDE_INT const_max_nunits;
> - if (!max_nunits.is_constant (&const_max_nunits)
> + if (!nunits.max.is_constant (&const_max_nunits)
> || const_max_nunits > group_size)
> {
> if (dump_enabled_p ())
> dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> "Build SLP failed: store group "
> - "size not a multiple of the vector size "
> - "in basic block SLP\n");
> + "size %u not a multiple of the vector "
> + "size " HOST_WIDE_INT_PRINT_UNSIGNED
> + " in basic block SLP\n",
> + group_size, estimated_poly_value
> (nunits.max));
> vect_free_slp_tree (node);
> return false;
> }
> @@ -5143,7 +5143,7 @@ vect_analyze_slp_instance (vec_info *vinfo,
> /* Analyze the stored values and pinch them together with
> a permute node so we can preserve the whole store group. */
> auto_vec<slp_tree> rhs_nodes;
> - poly_uint64 max_nunits = 1;
> + slp_tree_nunits nunits{};
>
> unsigned int rhs_common_nlanes = 0;
> unsigned int start = 0, end = i;
> @@ -5154,14 +5154,14 @@ vect_analyze_slp_instance (vec_info *vinfo,
> substmts.create (end - start);
> for (unsigned j = start; j < end; ++j)
> substmts.quick_push (scalar_stmts[j]);
> - max_nunits = 1;
> + nunits = {UINT64_MAX, 1};
> node = vect_build_slp_tree (vinfo, substmts, end - start,
> - &max_nunits,
> + &nunits,
> matches, limit, &tree_size,
> bst_map);
> if (node)
> {
> rhs_nodes.safe_push (node);
> - vect_update_max_nunits (&max_nunits, node->max_nunits);
> + vect_update_nunits (&nunits, node->nunits);
> if (start == 0)
> rhs_common_nlanes = SLP_TREE_LANES (node);
> else if (rhs_common_nlanes != SLP_TREE_LANES (node))
> @@ -5225,7 +5225,7 @@ vect_analyze_slp_instance (vec_info *vinfo,
> SLP_TREE_CHILDREN
> (rhs_nodes[0]).length ());
> SLP_TREE_VECTYPE (node) = SLP_TREE_VECTYPE (rhs_nodes[0]);
> - node->max_nunits = max_nunits;
> + node->nunits = nunits;
> node->ldst_lanes = true;
> SLP_TREE_CHILDREN (node)
> .reserve_exact (SLP_TREE_CHILDREN (rhs_nodes[0]).length ()
> @@ -5243,7 +5243,7 @@ vect_analyze_slp_instance (vec_info *vinfo,
> }
> else
> node = vect_build_slp_store_interleaving (rhs_nodes, scalar_stmts,
> - max_nunits);
> + nunits);
>
> while (!rhs_nodes.is_empty ())
> vect_free_slp_tree (rhs_nodes.pop ());
> @@ -5520,13 +5520,13 @@ vect_lower_load_permutations (loop_vec_info
> loop_vinfo,
> }
> for (unsigned i = 0; i < DR_GROUP_GAP (first); ++i)
> stmts.quick_push (NULL);
> - poly_uint64 max_nunits = 1;
> + slp_tree_nunits nunits{};
> bool *matches = XALLOCAVEC (bool, group_lanes);
> unsigned limit = 1;
> unsigned tree_size = 0;
> slp_tree l0 = vect_build_slp_tree (loop_vinfo, stmts,
> group_lanes,
> - &max_nunits, matches, &limit,
> + &nunits, matches, &limit,
> &tree_size, bst_map);
> gcc_assert (!SLP_TREE_LOAD_PERMUTATION (l0).exists ());
>
> @@ -8398,7 +8398,7 @@ vect_update_slp_vf_for_node (slp_tree node, poly_uint64
> &vf,
>
> /* We do not visit SLP nodes for constants or externals - those neither
> have a vector type set yet (vectorizable_* does this) nor do they
> - have max_nunits set. Instead we rely on internal nodes max_nunit
> + have nunits set. Instead we rely on internal nodes' nunits records
> to cover constant/external operands.
> Note that when we stop using fixed size vectors externs and constants
> shouldn't influence the (minimum) vectorization factor, instead
> @@ -8406,7 +8406,7 @@ vect_update_slp_vf_for_node (slp_tree node, poly_uint64
> &vf,
> assign vector types to constants and externals and cause iteration
> to a higher vectorization factor when required. */
> poly_uint64 node_vf
> - = calculate_unrolling_factor (node->max_nunits, SLP_TREE_LANES (node));
> + = calculate_unrolling_factor (node->nunits, SLP_TREE_LANES (node));
> vf = force_common_multiple (vf, node_vf);
>
> /* For permute nodes that are fed from externs or constants we have to
> @@ -8416,7 +8416,7 @@ vect_update_slp_vf_for_node (slp_tree node, poly_uint64
> &vf,
> if (SLP_TREE_DEF_TYPE (child) != vect_internal_def)
> {
> poly_uint64 child_vf
> - = calculate_unrolling_factor (node->max_nunits,
> + = calculate_unrolling_factor (node->nunits,
> SLP_TREE_LANES (child));
> vf = force_common_multiple (vf, child_vf);
> }
> diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
> index 55f0bee0eb7..eda5275586d 100644
> --- a/gcc/tree-vectorizer.h
> +++ b/gcc/tree-vectorizer.h
> @@ -254,6 +254,17 @@ typedef auto_vec<std::pair<unsigned, unsigned>, 16>
> auto_lane_permutation_t;
> typedef vec<unsigned> load_permutation_t;
> typedef auto_vec<unsigned, 16> auto_load_permutation_t;
>
> +struct slp_tree_nunits
> +{
> + slp_tree_nunits () = default;
> +
> + /* The minimum number of vector elements for a subtree.
> + UINT_MAX means unknown (no minimum recorded yet). */
> + poly_uint64 min = UINT64_MAX;
> + /* The maximum number of vector elements for a subtree. */
> + poly_uint64 max = 1;
> +};
> +
> struct vect_data {
> virtual ~vect_data () = default;
> };
> @@ -348,9 +359,9 @@ struct _slp_tree {
>
> /* Reference count in the SLP graph. */
> unsigned int refcnt;
> - /* The maximum number of vector elements for the subtree rooted
> + /* The minimum and maximum number of vector elements for the subtree rooted
> at this node. */
> - poly_uint64 max_nunits;
> + slp_tree_nunits nunits;
> /* The DEF type of this node. */
> enum vect_def_type def_type;
> /* The number of scalar lanes produced by this node. */
> @@ -2332,6 +2343,37 @@ vect_update_max_nunits (poly_uint64 *max_nunits, tree
> vectype)
> vect_update_max_nunits (max_nunits, TYPE_VECTOR_SUBPARTS (vectype));
> }
>
> +/* Update minimum and maximum unit count *NUNITS so that it accounts for
> + NEW_NUNITS. *NUNITS can be {MAX,1} if we haven't yet recorded anything.
> + If NEW_NUNITS is {MAX,1} then this function has no effect. */
> +
> +inline void
> +vect_update_nunits (slp_tree_nunits *nunits, slp_tree_nunits new_nunits)
> +{
> + vect_update_max_nunits (&nunits->max, new_nunits.max);
> +
> + /* We also want to know whether each individual choice of vector type
> + requires no "unrolling", which requires the minimum number of units.
> + All unit counts have the form vec_info::vector_size * X for some
> + rational X, therefore we know the values are ordered. */
> + if (!known_eq (new_nunits.min, UINT64_MAX))
> + nunits->min = known_eq (nunits->min, UINT64_MAX)
> + ? new_nunits.min
> + : ordered_min (nunits->min, new_nunits.min);
> +}
> +
> +/* Update maximum unit count *NUNITS so that it accounts for
> + the number of units in vector type VECTYPE. *NUNITS can be {MAX,1}
> + if we haven't yet recorded any vector types. */
> +
> +inline void
> +vect_update_nunits (slp_tree_nunits *nunits, tree vectype)
> +{
> + slp_tree_nunits new_nunits
> + = {TYPE_VECTOR_SUBPARTS (vectype), TYPE_VECTOR_SUBPARTS (vectype)};
> + vect_update_nunits (nunits, new_nunits);
> +}
> +
> /* Return the vectorization factor that should be used for costing
> purposes while vectorizing the loop described by LOOP_VINFO.
> Pick a reasonable estimate if the vectorization factor isn't
> --
> 2.43.0
>
p
Description: Binary data
