> -----Original Message-----
> From: Richard Biener <[email protected]>
> Sent: 06 May 2026 14:03
> To: [email protected]
> Cc: Tamar Christina <[email protected]>
> Subject: [PATCH] [RFC][v2] Add vector_costs::add_slp_cost grouping hook
> 
> The following simplifies the earlier RFC for making it easier
> for the target to correlate multiple cost events created from
> a single SLP operation.  Instead of changing where we record
> costs this patch only adjusts the submission part.
> 
> Targets wanting to take advantage of this can implement
> add_slp_cost and handle all or select cases and resort
> to the default implementation to get add_stmt_cost events
> for unhandled groups.
> 
> Bootstrapped and tested on x86_64-unknown-linux-gnu.
> 
> Any comments?  Would that suit your needs?  For loads/stores
> my plan was to eventually stuff missing pieces (for costing)
> into vect_load_store_data of the SLP node (though I somewhat
> hate exposing that details to targets...)

This would suit my needs, and help us simplify blocks like

  if (kind == vec_promote_demote
      && (assign = dyn_cast <gassign *> (STMT_VINFO_STMT (stmt_info)))
      && gimple_assign_rhs_code (assign) == FLOAT_EXPR)
    {
      auto new_count = count * 2 - m_num_last_promote_demote;
      m_num_last_promote_demote = count;
      count = new_count;
    }
  else
    m_num_last_promote_demote = 0;

where we have to anticipate the expected promotion calls from the
middle end and scale appropriately so they line up.

vect_load_store_data does seem a bit like a can of work to expose
to targets.  Perhaps have a similar vect_load_store_data_for_costing
and have vect_load_store_data extend that so we can can separate
out the information we want to pass on for costing from the rest?

An accessor function/macro could then downcast.

One of the problems I have with load/store costing atm is that for
gathers/scatters I don't think we cost the instructions to make up
the address vectors (or at least couldn't find it).

So with something like

void f(int x, int a, int b, float c,
                 float *restrict d, float *restrict e,
                 int f, int g)
{
  for (int h = x; h; h++) {
    float i = 0;
    for (int j = 0; j < a; ++j) {
      float k = f ?: j, l = g ? d[h * j] : d[j * b + h];
      i += k * l;
    }
    e[h] = c * i;
  }
}

We end up thinking that

        mla     z23.s, p5/m, z5.s, z7.s
        ld1w    z3.s, p6/z, [x3, z23.s, sxtw 2]

is cheaper than it is. It would be nice to determine
such information I wouldn't have to look at the DRs
in target costing.

But I digress, 

> 
>       * tree-vectorizer.h (vector_costs::add_slp_cost): New.
>       * tree-vectorizer.cc (vector_costs::add_slp_cost): New
>       default version.
>       * tree-vect-slp.cc (add_slp_costs): Helper for dispatching
>       a cost vector in SLP chunks.
>       (vect_slp_analyze_operations): Adjust.
>       (li_cost_vec_cmp): Likewise.
>       (vect_bb_vectorization_profitable_p): Likewise.
> ---
>  gcc/tree-vect-slp.cc   | 30 +++++++++++++++++++++++++-----
>  gcc/tree-vectorizer.cc | 11 +++++++++++
>  gcc/tree-vectorizer.h  |  5 +++++
>  3 files changed, 41 insertions(+), 5 deletions(-)
> 
> diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> index 7be8ca03763..54f6ccca6f3 100644
> --- a/gcc/tree-vect-slp.cc
> +++ b/gcc/tree-vect-slp.cc
> @@ -9167,6 +9167,24 @@ vect_slp_prune_covered_roots (slp_tree node,
> hash_set<stmt_vec_info> &roots,
>        vect_slp_prune_covered_roots (child, roots, visited);
>  }
> 
> +/* Hand over COST_VEC to the target COSTS grouped by SLP node.  */
> +
> +static void
> +add_slp_costs (vector_costs *costs, stmt_vector_for_cost& cost_vec)
> +{
> +  for (unsigned start = 0; start < cost_vec.length ();)
> +    {
> +      unsigned end = start;
> +      while (end + 1 < cost_vec.length ()
> +          && cost_vec[start].node == cost_vec[end+1].node)
> +     end++;
> +      costs->add_slp_cost (cost_vec[start].node,
> +                        array_slice<stmt_info_for_cost>
> +                          (cost_vec.begin () + start, end - start + 1));
> +      start = end + 1;
> +    }
> +}
> +

This is probably slightly easier to follow if you use end = start + 1?
That should avoid the +1 bookkeeping everywhere.

Thanks,
Tamar

>  /* Analyze statements in SLP instances of VINFO.  Return true if the
>     operations are supported. */
> 
> @@ -9252,7 +9270,7 @@ vect_slp_analyze_operations (vec_info *vinfo)
>         i++;
>         if (loop_vec_info loop_vinfo = dyn_cast<loop_vec_info> (vinfo))
>           {
> -           add_stmt_costs (loop_vinfo->vector_costs, &cost_vec);
> +           add_slp_costs (loop_vinfo->vector_costs, cost_vec);
>             cost_vec.release ();
>           }
>         else
> @@ -9520,7 +9538,7 @@ vect_bb_slp_scalar_cost (bb_vec_info vinfo,
>  /* Comparator for the loop-index sorted cost vectors.  */
> 
>  static int
> -li_cost_vec_cmp (const void *a_, const void *b_)
> +li_cost_vec_cmp (const void *a_, const void *b_, void *)
>  {
>    auto *a = (const std::pair<unsigned, stmt_info_for_cost *> *)a_;
>    auto *b = (const std::pair<unsigned, stmt_info_for_cost *> *)b_;
> @@ -9610,8 +9628,8 @@ vect_bb_vectorization_profitable_p (bb_vec_info
> bb_vinfo,
>       l = gimple_bb (cost->stmt_info->stmt)->loop_father->num;
>        li_vector_costs.quick_push (std::make_pair (l, cost));
>      }
> -  li_scalar_costs.qsort (li_cost_vec_cmp);
> -  li_vector_costs.qsort (li_cost_vec_cmp);
> +  li_scalar_costs.stablesort (li_cost_vec_cmp, NULL);
> +  li_vector_costs.stablesort (li_cost_vec_cmp, NULL);
> 
>    /* Now cost the portions individually.  */
>    unsigned vi = 0;
> @@ -9651,13 +9669,15 @@ vect_bb_vectorization_profitable_p
> (bb_vec_info bb_vinfo,
> 
>        /* Complete the target-specific vector cost calculation.  */
>        class vector_costs *vect_target_cost_data = init_cost (bb_vinfo, 
> false);
> +      auto_vec<stmt_info_for_cost> tem;
>        do
>       {
> -       add_stmt_cost (vect_target_cost_data, li_vector_costs[vi].second);
> +       tem.safe_push (*li_vector_costs[vi].second);
>         vi++;
>       }
>        while (vi < li_vector_costs.length ()
>            && li_vector_costs[vi].first == vl);
> +      add_slp_costs (vect_target_cost_data, tem);
>        vect_target_cost_data->finish_cost (scalar_target_cost_data);
>        vec_prologue_cost = vect_target_cost_data->prologue_cost ();
>        vec_inside_cost = vect_target_cost_data->body_cost ();
> diff --git a/gcc/tree-vectorizer.cc b/gcc/tree-vectorizer.cc
> index 9e71f71fab7..613e4399dc9 100644
> --- a/gcc/tree-vectorizer.cc
> +++ b/gcc/tree-vectorizer.cc
> @@ -1845,6 +1845,17 @@ vector_costs::add_stmt_cost (int count,
> vect_cost_for_stmt kind,
>    return record_stmt_cost (stmt_info, where, cost);
>  }
> 
> +unsigned int
> +vector_costs::add_slp_cost (slp_tree,
> +                         const array_slice<stmt_info_for_cost> &cost_vec)
> +{
> +  unsigned int sum = 0;
> +  for (auto item : cost_vec)
> +    sum += add_stmt_cost (item.count, item.kind, item.stmt_info, item.node,
> +                       item.vectype, item.misalign, item.where);
> +  return sum;
> +}
> +
>  /* See the comment above the declaration for details.  */
> 
>  void
> diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
> index de50ed3277c..3a01e1be0f1 100644
> --- a/gcc/tree-vectorizer.h
> +++ b/gcc/tree-vectorizer.h
> @@ -1777,6 +1777,11 @@ public:
>                                     tree vectype, int misalign,
>                                     vect_cost_model_location where);
> 
> +  /* Update the costs in response to adding costs in V which are all from
> +     vectorizing NODE to the respective part.  */
> +  virtual unsigned int add_slp_cost (slp_tree node,
> +                                  const array_slice<stmt_info_for_cost> &v);
> +
>    /* Finish calculating the cost of the code.  The results can be
>       read back using the functions below.
> 
> --
> 2.51.0

Reply via email to