On Mon, Nov 24, 2025 at 8:01 PM Christopher Bazley <[email protected]> wrote:
>
> To decide whether to create a new SLP instance for BB SLP,
> vect_analyze_slp_instance will need the minimum number of lanes
> in the SLP tree, which must not be less than the group size
> (otherwise "unrolling" is required). All usage of max_nunits
> is therefore replaced with a new class that encapsulates
> both minimum and maximum.
>
> For now, the minimum value is unused.

Tracking minimum and maximum nunits on each SLP node is overkill
and the way we accumulate those to a global minimum/maximum on
an SLP graph entry is not suitable to compute "unrolling" or a split
point on the entry node group-size.  I'd like to see us do away with
tracking nunits at all.

In fact the maybe_ne (unrolling_factor, 1U) code in vect_analyze_slp_instance
and vect_build_slp_instance is currently unreachable due to the check
already present in vect_record_max_nunits.

The only other use in vect_update_slp_vf_for_node should be replaced by
the local SLP_TREE_VECTYPE, like with the attached patch.  I've put
this to do "later" for some time now because as long as we had both SLP
and non-SLP the vect_maybe_update_slp_op_vectype code is not
really taking advantage of such local decisions and it gets it "wrong" for
invariants for example in vectorizable_conversion, leading to ICEs
in gcc.dg/vect/O3-pr87546.c and gcc.dg/vect/O3-vect-pr32243.c.
This could be mitigated locally in vect_update_slp_vf_for_node,
but the real fix is to not require extra unrolling because of constant/invariant
nodes.

That said, I really want to get rid of max_nunits, not add to it.  Instead
for the purpose of splitting and validating vector type validity for BB
vectorization I'd resort to a walk over the graph like
vect_update_slp_vf_for_node
does, deciding on splitting and predication.  This change to track not only
maximum but minimum nunits goes in the wrong direction.  For the
purpose of this patch I suggest to compute what you need for the "minimum"
by a new SLP graph walk instead.

I have attached the patch that should ideally work (but doesn't for the reason
pointed out above).  It would make max_nunits unused.

Richard.

> gcc/ChangeLog:
>
>         * tree-vect-slp.cc (_slp_tree::_slp_tree): Initialize the lower and
>         upper bounds for an SLP tree node instead of only the upper bound.
>         The lower bound is UINT64_MAX to allow it to grow upwards.
>         (vect_record_max_nunits): Renamed as vect_record_nunits.
>         (vect_record_nunits): Change parameter type from poly_uint64 * to
>         slp_tree_nunits * and rename from max_nunits to nunits.
>         Call vect_update_nunits instead of vect_update_max_nunits.
>         (vect_build_slp_tree_1): Change parameter type from poly_uint64 * to
>         slp_tree_nunits * and rename from max_nunits to nunits.
>         Update for renaming of the vect_record_max_nunits function.
>         (vect_build_slp_tree_2): Change parameter type from poly_uint64 * to
>         slp_tree_nunits * and rename from max_nunits to nunits.
>         Substitute local variable this_nunits of type slp_tree_nunits for
>         this_max_nunits of type poly_uint64.
>         Update for renaming of the vect_record_max_nunits function and the
>         max_nunits member of _slp_tree.
>         (vect_build_slp_tree): Change parameter type from poly_uint64 * to
>         slp_tree_nunits * and rename from max_nunits to nunits.
>         Update for renaming of the max_nunits member of _slp_tree.
>         Substitute local variable this_nunits of type slp_tree_nunits for
>         this_max_nunits of type poly_uint64.
>         Rely on the bounds being initialized to the default member values.
>         Call vect_update_nunits instead of vect_update_max_nunits.
>         (vect_print_slp_tree): Dump nunits.min and nunits.max of the
>         _slp_tree instead of the max_nunits member they replace.
>         (calculate_unrolling_factor): Update parameter type from poly_uint64
>         to slp_tree_nunits. Use the nunits.max member.
>         (optimize_load_redistribution_1): Substitute local variable nunits of
>         type slp_tree_nunits for max_nunits of type poly_uint64.
>         Rely on the bounds being initialized to the default member values.
>         (vect_build_slp_store_interleaving): Update parameter type from
>         poly_uint64 to slp_tree_nunits and rename from max_nunits to nunits.
>         Update for renaming of the max_nunits member of _slp_tree.
>         (vect_build_slp_instance):  Substitute local variable nunits of type
>         slp_tree_nunits for max_nunits of type poly_uint64.
>         Rely on the bounds being initialized to the default member values.
>         (vect_analyze_slp_reduc_chain): As above.
>         (vect_analyze_slp_reduction): As above.
>         (vect_analyze_slp_instance): Substitute local variable nunits of type
>         slp_tree_nunits for max_nunits of type poly_uint64.
>         Rely on the bounds being initialized to the default member values but
>         also explicitly reinitialize bounds to their defaults before
>         each invocation of vect_build_slp_tree when trying to break a group
>         into pieces.
>         Improve a diagnostic message printed when this function fails.
>         Call vect_update_nunits instead of vect_update_max_nunits.
>         (vect_lower_load_permutations): Substitute local variable nunits of 
> type
>         slp_tree_nunits for max_nunits of type poly_uint64.
>         Rely on the bounds being initialized to the default member values.
>         (vect_update_slp_vf_for_node): Update for renaming of the max_nunits
>         member of _slp_tree.
>         * tree-vectorizer.h (struct slp_tree_nunits): New type definition
>         to represent the minimum and maximum number of vector elements for
>         a subtree.
>         (struct _slp_tree): Replace the max_nunits member of type poly_uint64
>         with nunits of type slp_tree_nunits
>         (vect_update_nunits): New function to update the range stored in one
>         instance of slp_tree_nunits so that it becomes a superset of the range
>         stored in another.
>         Call vect_update_max_nunits internally so calls to the new function
>         can be substituted for calls to the existing function.
>         On return from vect_update_max_nunits, reduce the minimum bound if
>         applicable. Both minima are compared explicitly against UINT64_MAX
>         (the initial value, meaning 'empty') to avoid invalid use of
>         ordered_min. (UINT64_MAX is huge but not polynomial.)
>
> ---
>  gcc/tree-vect-slp.cc  | 162 +++++++++++++++++++++---------------------
>  gcc/tree-vectorizer.h |  46 +++++++++++-
>  2 files changed, 125 insertions(+), 83 deletions(-)
>
> diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> index 0ab15fde469..2369319b6ea 100644
> --- a/gcc/tree-vect-slp.cc
> +++ b/gcc/tree-vect-slp.cc
> @@ -130,7 +130,7 @@ _slp_tree::_slp_tree ()
>    this->cycle_info.reduc_idx = -1;
>    SLP_TREE_REF_COUNT (this) = 1;
>    this->failed = NULL;
> -  this->max_nunits = 1;
> +  this->nunits = {UINT64_MAX, 1};
>    this->lanes = 0;
>    SLP_TREE_TYPE (this) = undef_vec_info_type;
>    this->data = NULL;
> @@ -1051,14 +1051,14 @@ compatible_calls_p (gcall *call1, gcall *call2, bool 
> allow_two_operators)
>  /* A subroutine of vect_build_slp_tree for checking VECTYPE, which is the
>     caller's attempt to find the vector type in STMT_INFO with the narrowest
>     element type.  Return true if VECTYPE is nonnull and if it is valid
> -   for STMT_INFO.  When returning true, update MAX_NUNITS to reflect the
> -   number of units in VECTYPE.  GROUP_SIZE and MAX_NUNITS are as for
> +   for STMT_INFO.  When returning true, update *NUNITS to reflect the
> +   number of units in VECTYPE.  GROUP_SIZE and NUNITS are as for
>     vect_build_slp_tree.  */
>
>  static bool
> -vect_record_max_nunits (vec_info *vinfo, stmt_vec_info stmt_info,
> -                       unsigned int group_size,
> -                       tree vectype, poly_uint64 *max_nunits)
> +vect_record_nunits (vec_info *vinfo, stmt_vec_info stmt_info,
> +                   unsigned int group_size, tree vectype,
> +                   slp_tree_nunits *nunits)
>  {
>    if (!vectype)
>      {
> @@ -1071,7 +1071,7 @@ vect_record_max_nunits (vec_info *vinfo, stmt_vec_info 
> stmt_info,
>      }
>
>    /* If populating the vector type requires unrolling then fail
> -     before adjusting *max_nunits for basic-block vectorization.  */
> +     before adjusting *nunits for basic-block vectorization.  */
>    if (is_a <bb_vec_info> (vinfo)
>        && !multiple_p (group_size, TYPE_VECTOR_SUBPARTS (vectype)))
>      {
> @@ -1084,7 +1084,7 @@ vect_record_max_nunits (vec_info *vinfo, stmt_vec_info 
> stmt_info,
>      }
>
>    /* In case of multiple types we need to detect the smallest type.  */
> -  vect_update_max_nunits (max_nunits, vectype);
> +  vect_update_nunits (nunits, vectype);
>    return true;
>  }
>
> @@ -1105,7 +1105,7 @@ vect_record_max_nunits (vec_info *vinfo, stmt_vec_info 
> stmt_info,
>  static bool
>  vect_build_slp_tree_1 (vec_info *vinfo, unsigned char *swap,
>                        vec<stmt_vec_info> stmts, unsigned int group_size,
> -                      poly_uint64 *max_nunits, bool *matches,
> +                      slp_tree_nunits *nunits, bool *matches,
>                        bool *two_operators, tree *node_vectype)
>  {
>    unsigned int i;
> @@ -1145,8 +1145,8 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char 
> *swap,
>       as if nunits was not an issue.  This allows splitting of groups
>       to happen.  */
>    if (nunits_vectype
> -      && !vect_record_max_nunits (vinfo, first_stmt_info, group_size,
> -                                 nunits_vectype, max_nunits))
> +      && !vect_record_nunits (vinfo, first_stmt_info, group_size,
> +                             nunits_vectype, nunits))
>      {
>        gcc_assert (is_a <bb_vec_info> (vinfo));
>        maybe_soft_fail = true;
> @@ -1828,14 +1828,14 @@ vect_slp_linearize_chain (vec_info *vinfo,
>  static slp_tree
>  vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node,
>                        vec<stmt_vec_info> stmts, unsigned int group_size,
> -                      poly_uint64 *max_nunits,
> +                      slp_tree_nunits *nunits,
>                        bool *matches, unsigned *limit, unsigned *tree_size,
>                        scalar_stmts_to_slp_tree_map_t *bst_map);
>
>  static slp_tree
>  vect_build_slp_tree (vec_info *vinfo,
>                      vec<stmt_vec_info> stmts, unsigned int group_size,
> -                    poly_uint64 *max_nunits,
> +                    slp_tree_nunits *nunits,
>                      bool *matches, unsigned *limit, unsigned *tree_size,
>                      scalar_stmts_to_slp_tree_map_t *bst_map)
>  {
> @@ -1848,7 +1848,7 @@ vect_build_slp_tree (vec_info *vinfo,
>        if (!(*leader)->failed)
>         {
>           SLP_TREE_REF_COUNT (*leader)++;
> -         vect_update_max_nunits (max_nunits, (*leader)->max_nunits);
> +         vect_update_nunits (nunits, (*leader)->nunits);
>           stmts.release ();
>           return *leader;
>         }
> @@ -1882,9 +1882,9 @@ vect_build_slp_tree (vec_info *vinfo,
>      dump_printf_loc (MSG_NOTE, vect_location,
>                      "starting SLP discovery for node %p\n", (void *) res);
>
> -  poly_uint64 this_max_nunits = 1;
> +  slp_tree_nunits this_nunits{};
>    slp_tree res_ = vect_build_slp_tree_2 (vinfo, res, stmts, group_size,
> -                                       &this_max_nunits,
> +                                       &this_nunits,
>                                         matches, limit, tree_size, bst_map);
>    if (!res_)
>      {
> @@ -1913,8 +1913,8 @@ vect_build_slp_tree (vec_info *vinfo,
>                          "SLP discovery for node %p succeeded\n",
>                          (void *) res);
>        gcc_assert (res_ == res);
> -      res->max_nunits = this_max_nunits;
> -      vect_update_max_nunits (max_nunits, this_max_nunits);
> +      res->nunits = this_nunits;
> +      vect_update_nunits (nunits, this_nunits);
>        /* Keep a reference for the bst_map use.  */
>        SLP_TREE_REF_COUNT (res)++;
>      }
> @@ -1972,12 +1972,12 @@ vect_slp_build_two_operator_nodes (slp_tree perm, 
> tree vectype,
>  static slp_tree
>  vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node,
>                        vec<stmt_vec_info> stmts, unsigned int group_size,
> -                      poly_uint64 *max_nunits,
> +                      slp_tree_nunits *nunits,
>                        bool *matches, unsigned *limit, unsigned *tree_size,
>                        scalar_stmts_to_slp_tree_map_t *bst_map)
>  {
>    unsigned nops, i, this_tree_size = 0;
> -  poly_uint64 this_max_nunits = *max_nunits;
> +  slp_tree_nunits this_nunits = *nunits;
>
>    matches[0] = false;
>
> @@ -2003,8 +2003,7 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node,
>         tree scalar_type = TREE_TYPE (PHI_RESULT (stmt));
>         tree vectype = get_vectype_for_scalar_type (vinfo, scalar_type,
>                                                     group_size);
> -       if (!vect_record_max_nunits (vinfo, stmt_info, group_size, vectype,
> -                                    max_nunits))
> +       if (!vect_record_nunits (vinfo, stmt_info, group_size, vectype, 
> nunits))
>           return NULL;
>
>         vect_def_type def_type = STMT_VINFO_DEF_TYPE (stmt_info);
> @@ -2057,7 +2056,7 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node,
>    unsigned char *swap = XALLOCAVEC (unsigned char, group_size);
>    tree vectype = NULL_TREE;
>    if (!vect_build_slp_tree_1 (vinfo, swap, stmts, group_size,
> -                             &this_max_nunits, matches, &two_operators,
> +                             &this_nunits, matches, &two_operators,
>                               &vectype))
>      return NULL;
>
> @@ -2069,7 +2068,7 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node,
>         gcc_assert (DR_IS_READ (STMT_VINFO_DATA_REF (stmt_info)));
>        else
>         {
> -         *max_nunits = this_max_nunits;
> +         *nunits = this_nunits;
>           (*tree_size)++;
>           node = vect_create_new_slp_node (node, stmts, 0);
>           SLP_TREE_VECTYPE (node) = vectype;
> @@ -2154,7 +2153,7 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node,
>                   bool *matches2 = XALLOCAVEC (bool, dr_group_size);
>                   slp_tree unperm_load
>                     = vect_build_slp_tree (vinfo, stmts2, dr_group_size,
> -                                          &this_max_nunits, matches2, limit,
> +                                          &this_nunits, matches2, limit,
>                                            &this_tree_size, bst_map);
>                   /* When we are able to do the full masked load emit that
>                      followed by 'node' being the desired final permutation.  
> */
> @@ -2457,7 +2456,7 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node,
>                         else
>                           op_stmts.quick_push (NULL);
>                       child = vect_build_slp_tree (vinfo, op_stmts,
> -                                                  group_size, 
> &this_max_nunits,
> +                                                  group_size, &this_nunits,
>                                                    matches, limit,
>                                                    &this_tree_size, bst_map);
>                       /* ???  We're likely getting too many fatal mismatches
> @@ -2613,7 +2612,7 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node,
>               children[i] = child;
>             }
>           *tree_size += this_tree_size + 1;
> -         *max_nunits = this_max_nunits;
> +         *nunits = this_nunits;
>           while (!chains.is_empty ())
>             chains.pop ().release ();
>           return node;
> @@ -2892,7 +2891,7 @@ out:
>           def_stmts2.create (1);
>           def_stmts2.quick_push (oprnd_info->def_stmts[0]);
>           child = vect_build_slp_tree (vinfo, def_stmts2, 1,
> -                                      &this_max_nunits,
> +                                      &this_nunits,
>                                        matches, limit,
>                                        &this_tree_size, bst_map);
>           if (child)
> @@ -2910,7 +2909,7 @@ out:
>                     .quick_push (std::make_pair (0u, 0u));
>                 }
>               SLP_TREE_CHILDREN (pnode).quick_push (child);
> -             pnode->max_nunits = child->max_nunits;
> +             pnode->nunits = child->nunits;
>               children.safe_push (pnode);
>               oprnd_info->def_stmts = vNULL;
>               continue;
> @@ -2920,7 +2919,7 @@ out:
>         }
>
>        if ((child = vect_build_slp_tree (vinfo, oprnd_info->def_stmts,
> -                                       group_size, &this_max_nunits,
> +                                       group_size, &this_nunits,
>                                         matches, limit,
>                                         &this_tree_size, bst_map)) != NULL)
>         {
> @@ -3009,7 +3008,7 @@ out:
>           /* And try again with scratch 'matches' ... */
>           bool *tem = XALLOCAVEC (bool, group_size);
>           if ((child = vect_build_slp_tree (vinfo, oprnd_info->def_stmts,
> -                                           group_size, &this_max_nunits,
> +                                           group_size, &this_nunits,
>                                             tem, limit,
>                                             &this_tree_size, bst_map)) != 
> NULL)
>             {
> @@ -3115,7 +3114,7 @@ fail:
>      }
>
>    *tree_size += this_tree_size + 1;
> -  *max_nunits = this_max_nunits;
> +  *nunits = this_nunits;
>
>    if (two_operators)
>      {
> @@ -3261,16 +3260,15 @@ vect_print_slp_tree (dump_flags_t dump_kind, 
> dump_location_t loc,
>
>    dump_metadata_t metadata (dump_kind, loc.get_impl_location ());
>    dump_user_location_t user_loc = loc.get_user_location ();
> -  dump_printf_loc (metadata, user_loc,
> -                  "node%s %p (max_nunits=" HOST_WIDE_INT_PRINT_UNSIGNED
> -                  ", refcnt=%u)",
> -                  SLP_TREE_DEF_TYPE (node) == vect_external_def
> -                  ? " (external)"
> -                  : (SLP_TREE_DEF_TYPE (node) == vect_constant_def
> -                     ? " (constant)"
> -                     : ""), (void *) node,
> -                  estimated_poly_value (node->max_nunits),
> -                                        SLP_TREE_REF_COUNT (node));
> +  dump_printf_loc (
> +    metadata, user_loc,
> +    "node%s %p (nunits.min=" HOST_WIDE_INT_PRINT_UNSIGNED
> +    ", nunits.max=" HOST_WIDE_INT_PRINT_UNSIGNED ", refcnt=%u)",
> +    SLP_TREE_DEF_TYPE (node) == vect_external_def
> +      ? " (external)"
> +      : (SLP_TREE_DEF_TYPE (node) == vect_constant_def ? " (constant)" : ""),
> +    (void *) node, estimated_poly_value (node->nunits.min),
> +    estimated_poly_value (node->nunits.max), SLP_TREE_REF_COUNT (node));
>    if (SLP_TREE_VECTYPE (node))
>      dump_printf (metadata, " %T", SLP_TREE_VECTYPE (node));
>    dump_printf (metadata, "%s",
> @@ -3637,9 +3635,9 @@ vect_split_slp_store_group (stmt_vec_info first_vinfo, 
> unsigned group1_size)
>     statements and a vector of NUNITS elements.  */
>
>  static poly_uint64
> -calculate_unrolling_factor (poly_uint64 nunits, unsigned int group_size)
> +calculate_unrolling_factor (slp_tree_nunits nunits, unsigned int group_size)
>  {
> -  return exact_div (common_multiple (nunits, group_size), group_size);
> +  return exact_div (common_multiple (nunits.max, group_size), group_size);
>  }
>
>  /* Helper that checks to see if a node is a load node.  */
> @@ -3701,9 +3699,9 @@ optimize_load_redistribution_1 
> (scalar_stmts_to_slp_tree_map_t *bst_map,
>                          (void *) root);
>
>        bool *matches = XALLOCAVEC (bool, group_size);
> -      poly_uint64 max_nunits = 1;
> +      slp_tree_nunits nunits{};
>        unsigned tree_size = 0, limit = 1;
> -      node = vect_build_slp_tree (vinfo, stmts, group_size, &max_nunits,
> +      node = vect_build_slp_tree (vinfo, stmts, group_size, &nunits,
>                                   matches, &limit, &tree_size, bst_map);
>        if (!node)
>         stmts.release ();
> @@ -3886,14 +3884,14 @@ vect_analyze_slp_instance (vec_info *vinfo,
>  static slp_tree
>  vect_build_slp_store_interleaving (vec<slp_tree> &rhs_nodes,
>                                    vec<stmt_vec_info> &scalar_stmts,
> -                                  poly_uint64 max_nunits)
> +                                  slp_tree_nunits nunits)
>  {
>    unsigned int group_size = scalar_stmts.length ();
>    slp_tree node = vect_create_new_slp_node (scalar_stmts,
>                                             SLP_TREE_CHILDREN
>                                               (rhs_nodes[0]).length ());
>    SLP_TREE_VECTYPE (node) = SLP_TREE_VECTYPE (rhs_nodes[0]);
> -  node->max_nunits = max_nunits;
> +  node->nunits = nunits;
>    for (unsigned l = 0;
>         l < SLP_TREE_CHILDREN (rhs_nodes[0]).length (); ++l)
>      {
> @@ -3903,7 +3901,7 @@ vect_build_slp_store_interleaving (vec<slp_tree> 
> &rhs_nodes,
>        SLP_TREE_CHILDREN (node).quick_push (perm);
>        SLP_TREE_LANE_PERMUTATION (perm).create (group_size);
>        SLP_TREE_VECTYPE (perm) = SLP_TREE_VECTYPE (node);
> -      perm->max_nunits = max_nunits;
> +      perm->nunits = nunits;
>        SLP_TREE_LANES (perm) = group_size;
>        /* ???  We should set this NULL but that's not expected.  */
>        SLP_TREE_REPRESENTATIVE (perm)
> @@ -3959,7 +3957,7 @@ vect_build_slp_store_interleaving (vec<slp_tree> 
> &rhs_nodes,
>               SLP_TREE_LANES (permab) = n;
>               SLP_TREE_LANE_PERMUTATION (permab).create (n);
>               SLP_TREE_VECTYPE (permab) = SLP_TREE_VECTYPE (perm);
> -             permab->max_nunits = max_nunits;
> +             permab->nunits = nunits;
>               /* ???  Should be NULL but that's not expected.  */
>               SLP_TREE_REPRESENTATIVE (permab) = SLP_TREE_REPRESENTATIVE 
> (perm);
>               SLP_TREE_CHILDREN (permab).quick_push (a);
> @@ -4030,7 +4028,7 @@ vect_build_slp_store_interleaving (vec<slp_tree> 
> &rhs_nodes,
>           SLP_TREE_LANES (permab) = n;
>           SLP_TREE_LANE_PERMUTATION (permab).create (n);
>           SLP_TREE_VECTYPE (permab) = SLP_TREE_VECTYPE (perm);
> -         permab->max_nunits = max_nunits;
> +         permab->nunits = nunits;
>           /* ???  Should be NULL but that's not expected.  */
>           SLP_TREE_REPRESENTATIVE (permab) = SLP_TREE_REPRESENTATIVE (perm);
>           SLP_TREE_CHILDREN (permab).quick_push (a);
> @@ -4115,7 +4113,7 @@ vect_build_slp_instance (vec_info *vinfo,
>    /* Build the tree for the SLP instance.  */
>    unsigned int group_size = scalar_stmts.length ();
>    bool *matches = XALLOCAVEC (bool, group_size);
> -  poly_uint64 max_nunits = 1;
> +  slp_tree_nunits nunits{};
>    unsigned tree_size = 0;
>
>    slp_tree node = NULL;
> @@ -4126,19 +4124,19 @@ vect_build_slp_instance (vec_info *vinfo,
>      }
>    else
>      node = vect_build_slp_tree (vinfo, scalar_stmts, group_size,
> -                               &max_nunits, matches, limit,
> +                               &nunits, matches, limit,
>                                 &tree_size, bst_map);
>    if (node != NULL)
>      {
>        /* Calculate the unrolling factor based on the smallest type.  */
>        poly_uint64 unrolling_factor
> -       = calculate_unrolling_factor (max_nunits, group_size);
> +       = calculate_unrolling_factor (nunits, group_size);
>
>        if (maybe_ne (unrolling_factor, 1U)
>           && is_a <bb_vec_info> (vinfo))
>         {
>           unsigned HOST_WIDE_INT const_max_nunits;
> -         if (!max_nunits.is_constant (&const_max_nunits)
> +         if (!nunits.max.is_constant (&const_max_nunits)
>               || const_max_nunits > group_size)
>             {
>               if (dump_enabled_p ())
> @@ -4376,10 +4374,10 @@ vect_analyze_slp_reduc_chain (loop_vec_info vinfo,
>
>        unsigned int group_size = scalar_stmts.length ();
>        bool *matches = XALLOCAVEC (bool, group_size);
> -      poly_uint64 max_nunits = 1;
> +      slp_tree_nunits nunits{};
>        unsigned tree_size = 0;
>        slp_tree node = vect_build_slp_tree (vinfo, scalar_stmts, group_size,
> -                                          &max_nunits, matches, limit,
> +                                          &nunits, matches, limit,
>                                            &tree_size, bst_map);
>        if (!node)
>         {
> @@ -4519,7 +4517,7 @@ vect_analyze_slp_reduc_chain (loop_vec_info vinfo,
>    /* Build the tree for the SLP instance.  */
>    unsigned int group_size = scalar_stmts.length ();
>    bool *matches = XALLOCAVEC (bool, group_size);
> -  poly_uint64 max_nunits = 1;
> +  slp_tree_nunits nunits{};
>    unsigned tree_size = 0;
>
>    /* ???  We need this only for SLP discovery.  */
> @@ -4527,7 +4525,7 @@ vect_analyze_slp_reduc_chain (loop_vec_info vinfo,
>      REDUC_GROUP_FIRST_ELEMENT (scalar_stmts[i]) = scalar_stmts[0];
>
>    slp_tree node = vect_build_slp_tree (vinfo, scalar_stmts, group_size,
> -                                      &max_nunits, matches, limit,
> +                                      &nunits, matches, limit,
>                                        &tree_size, bst_map);
>
>    for (unsigned i = 0; i < scalar_stmts.length (); ++i)
> @@ -4669,11 +4667,11 @@ vect_analyze_slp_reduction (loop_vec_info vinfo,
>    /* Build the tree for the SLP instance.  */
>    unsigned int group_size = scalar_stmts.length ();
>    bool *matches = XALLOCAVEC (bool, group_size);
> -  poly_uint64 max_nunits = 1;
> +  slp_tree_nunits nunits{};
>    unsigned tree_size = 0;
>
>    slp_tree node = vect_build_slp_tree (vinfo, scalar_stmts, group_size,
> -                                      &max_nunits, matches, limit,
> +                                      &nunits, matches, limit,
>                                        &tree_size, bst_map);
>    if (node != NULL)
>      {
> @@ -4738,11 +4736,11 @@ vect_analyze_slp_reduction_group (loop_vec_info 
> loop_vinfo,
>    unsigned int group_size = scalar_stmts.length ();
>    if (!matches)
>      matches = XALLOCAVEC (bool, group_size);
> -  poly_uint64 max_nunits = 1;
> +  slp_tree_nunits nunits{};
>    unsigned tree_size = 0;
>    slp_tree node = vect_build_slp_tree (loop_vinfo, scalar_stmts,
>                                        group_size,
> -                                      &max_nunits, matches, limit,
> +                                      &nunits, matches, limit,
>                                        &tree_size, bst_map);
>    if (!node)
>      return false;
> @@ -4955,7 +4953,7 @@ vect_analyze_slp_instance (vec_info *vinfo,
>    /* Build the tree for the SLP instance.  */
>    unsigned int group_size = scalar_stmts.length ();
>    bool *matches = XALLOCAVEC (bool, group_size);
> -  poly_uint64 max_nunits = 1;
> +  slp_tree_nunits nunits{};
>    unsigned tree_size = 0;
>    unsigned i;
>
> @@ -4967,26 +4965,28 @@ vect_analyze_slp_instance (vec_info *vinfo,
>      }
>    else
>      node = vect_build_slp_tree (vinfo, scalar_stmts, group_size,
> -                               &max_nunits, matches, limit,
> +                               &nunits, matches, limit,
>                                 &tree_size, bst_map);
>    if (node != NULL)
>      {
>        /* Calculate the unrolling factor based on the smallest type.  */
>        poly_uint64 unrolling_factor
> -       = calculate_unrolling_factor (max_nunits, group_size);
> +       = calculate_unrolling_factor (nunits, group_size);
>
>        if (maybe_ne (unrolling_factor, 1U)
> -         && is_a <bb_vec_info> (vinfo))
> +         && is_a<bb_vec_info> (vinfo))
>         {
>           unsigned HOST_WIDE_INT const_max_nunits;
> -         if (!max_nunits.is_constant (&const_max_nunits)
> +         if (!nunits.max.is_constant (&const_max_nunits)
>               || const_max_nunits > group_size)
>             {
>               if (dump_enabled_p ())
>                 dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>                                  "Build SLP failed: store group "
> -                                "size not a multiple of the vector size "
> -                                "in basic block SLP\n");
> +                                "size %u not a multiple of the vector "
> +                                "size " HOST_WIDE_INT_PRINT_UNSIGNED
> +                                " in basic block SLP\n",
> +                                group_size, estimated_poly_value 
> (nunits.max));
>               vect_free_slp_tree (node);
>               return false;
>             }
> @@ -5143,7 +5143,7 @@ vect_analyze_slp_instance (vec_info *vinfo,
>           /* Analyze the stored values and pinch them together with
>              a permute node so we can preserve the whole store group.  */
>           auto_vec<slp_tree> rhs_nodes;
> -         poly_uint64 max_nunits = 1;
> +         slp_tree_nunits nunits{};
>
>           unsigned int rhs_common_nlanes = 0;
>           unsigned int start = 0, end = i;
> @@ -5154,14 +5154,14 @@ vect_analyze_slp_instance (vec_info *vinfo,
>               substmts.create (end - start);
>               for (unsigned j = start; j < end; ++j)
>                 substmts.quick_push (scalar_stmts[j]);
> -             max_nunits = 1;
> +             nunits = {UINT64_MAX, 1};
>               node = vect_build_slp_tree (vinfo, substmts, end - start,
> -                                         &max_nunits,
> +                                         &nunits,
>                                           matches, limit, &tree_size, 
> bst_map);
>               if (node)
>                 {
>                   rhs_nodes.safe_push (node);
> -                 vect_update_max_nunits (&max_nunits, node->max_nunits);
> +                 vect_update_nunits (&nunits, node->nunits);
>                   if (start == 0)
>                     rhs_common_nlanes = SLP_TREE_LANES (node);
>                   else if (rhs_common_nlanes != SLP_TREE_LANES (node))
> @@ -5225,7 +5225,7 @@ vect_analyze_slp_instance (vec_info *vinfo,
>                                                SLP_TREE_CHILDREN
>                                                  (rhs_nodes[0]).length ());
>               SLP_TREE_VECTYPE (node) = SLP_TREE_VECTYPE (rhs_nodes[0]);
> -             node->max_nunits = max_nunits;
> +             node->nunits = nunits;
>               node->ldst_lanes = true;
>               SLP_TREE_CHILDREN (node)
>                 .reserve_exact (SLP_TREE_CHILDREN (rhs_nodes[0]).length ()
> @@ -5243,7 +5243,7 @@ vect_analyze_slp_instance (vec_info *vinfo,
>             }
>           else
>             node = vect_build_slp_store_interleaving (rhs_nodes, scalar_stmts,
> -                                                     max_nunits);
> +                                                     nunits);
>
>           while (!rhs_nodes.is_empty ())
>             vect_free_slp_tree (rhs_nodes.pop ());
> @@ -5520,13 +5520,13 @@ vect_lower_load_permutations (loop_vec_info 
> loop_vinfo,
>         }
>        for (unsigned i = 0; i < DR_GROUP_GAP (first); ++i)
>         stmts.quick_push (NULL);
> -      poly_uint64 max_nunits = 1;
> +      slp_tree_nunits nunits{};
>        bool *matches = XALLOCAVEC (bool, group_lanes);
>        unsigned limit = 1;
>        unsigned tree_size = 0;
>        slp_tree l0 = vect_build_slp_tree (loop_vinfo, stmts,
>                                          group_lanes,
> -                                        &max_nunits, matches, &limit,
> +                                        &nunits, matches, &limit,
>                                          &tree_size, bst_map);
>        gcc_assert (!SLP_TREE_LOAD_PERMUTATION (l0).exists ());
>
> @@ -8398,7 +8398,7 @@ vect_update_slp_vf_for_node (slp_tree node, poly_uint64 
> &vf,
>
>    /* We do not visit SLP nodes for constants or externals - those neither
>       have a vector type set yet (vectorizable_* does this) nor do they
> -     have max_nunits set.  Instead we rely on internal nodes max_nunit
> +     have nunits set.  Instead we rely on internal nodes' nunits records
>       to cover constant/external operands.
>       Note that when we stop using fixed size vectors externs and constants
>       shouldn't influence the (minimum) vectorization factor, instead
> @@ -8406,7 +8406,7 @@ vect_update_slp_vf_for_node (slp_tree node, poly_uint64 
> &vf,
>       assign vector types to constants and externals and cause iteration
>       to a higher vectorization factor when required.  */
>    poly_uint64 node_vf
> -    = calculate_unrolling_factor (node->max_nunits, SLP_TREE_LANES (node));
> +    = calculate_unrolling_factor (node->nunits, SLP_TREE_LANES (node));
>    vf = force_common_multiple (vf, node_vf);
>
>    /* For permute nodes that are fed from externs or constants we have to
> @@ -8416,7 +8416,7 @@ vect_update_slp_vf_for_node (slp_tree node, poly_uint64 
> &vf,
>        if (SLP_TREE_DEF_TYPE (child) != vect_internal_def)
>         {
>           poly_uint64 child_vf
> -           = calculate_unrolling_factor (node->max_nunits,
> +           = calculate_unrolling_factor (node->nunits,
>                                           SLP_TREE_LANES (child));
>           vf = force_common_multiple (vf, child_vf);
>         }
> diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
> index 55f0bee0eb7..eda5275586d 100644
> --- a/gcc/tree-vectorizer.h
> +++ b/gcc/tree-vectorizer.h
> @@ -254,6 +254,17 @@ typedef auto_vec<std::pair<unsigned, unsigned>, 16> 
> auto_lane_permutation_t;
>  typedef vec<unsigned> load_permutation_t;
>  typedef auto_vec<unsigned, 16> auto_load_permutation_t;
>
> +struct slp_tree_nunits
> +{
> +  slp_tree_nunits () = default;
> +
> +  /* The minimum number of vector elements for a subtree.
> +     UINT_MAX means unknown (no minimum recorded yet).  */
> +  poly_uint64 min = UINT64_MAX;
> +  /* The maximum number of vector elements for a subtree.  */
> +  poly_uint64 max = 1;
> +};
> +
>  struct vect_data {
>    virtual ~vect_data () = default;
>  };
> @@ -348,9 +359,9 @@ struct _slp_tree {
>
>    /* Reference count in the SLP graph.  */
>    unsigned int refcnt;
> -  /* The maximum number of vector elements for the subtree rooted
> +  /* The minimum and maximum number of vector elements for the subtree rooted
>       at this node.  */
> -  poly_uint64 max_nunits;
> +  slp_tree_nunits nunits;
>    /* The DEF type of this node.  */
>    enum vect_def_type def_type;
>    /* The number of scalar lanes produced by this node.  */
> @@ -2332,6 +2343,37 @@ vect_update_max_nunits (poly_uint64 *max_nunits, tree 
> vectype)
>    vect_update_max_nunits (max_nunits, TYPE_VECTOR_SUBPARTS (vectype));
>  }
>
> +/* Update minimum and maximum unit count *NUNITS so that it accounts for
> +   NEW_NUNITS.  *NUNITS can be {MAX,1} if we haven't yet recorded anything.
> +   If NEW_NUNITS is {MAX,1} then this function has no effect.  */
> +
> +inline void
> +vect_update_nunits (slp_tree_nunits *nunits, slp_tree_nunits new_nunits)
> +{
> +  vect_update_max_nunits (&nunits->max, new_nunits.max);
> +
> +  /* We also want to know whether each individual choice of vector type
> +     requires no "unrolling", which requires the minimum number of units.
> +     All unit counts have the form vec_info::vector_size * X for some
> +     rational X, therefore we know the values are ordered.  */
> +  if (!known_eq (new_nunits.min, UINT64_MAX))
> +    nunits->min = known_eq (nunits->min, UINT64_MAX)
> +                   ? new_nunits.min
> +                   : ordered_min (nunits->min, new_nunits.min);
> +}
> +
> +/* Update maximum unit count *NUNITS so that it accounts for
> +   the number of units in vector type VECTYPE.  *NUNITS can be {MAX,1}
> +   if we haven't yet recorded any vector types.  */
> +
> +inline void
> +vect_update_nunits (slp_tree_nunits *nunits, tree vectype)
> +{
> +  slp_tree_nunits new_nunits
> +    = {TYPE_VECTOR_SUBPARTS (vectype), TYPE_VECTOR_SUBPARTS (vectype)};
> +  vect_update_nunits (nunits, new_nunits);
> +}
> +
>  /* Return the vectorization factor that should be used for costing
>     purposes while vectorizing the loop described by LOOP_VINFO.
>     Pick a reasonable estimate if the vectorization factor isn't
> --
> 2.43.0
>

Attachment: p
Description: Binary data

Reply via email to