RE: [PATCH] [RFC][v2] Add vector_costs::add_slp_cost grouping hook

Richard Biener Wed, 13 May 2026 04:05:32 -0700

On Wed, 13 May 2026, Tamar Christina wrote:

> > -----Original Message-----
> > From: Richard Biener <[email protected]>
> > Sent: 06 May 2026 14:03
> > To: [email protected]
> > Cc: Tamar Christina <[email protected]>
> > Subject: [PATCH] [RFC][v2] Add vector_costs::add_slp_cost grouping hook
> > 
> > The following simplifies the earlier RFC for making it easier
> > for the target to correlate multiple cost events created from
> > a single SLP operation.  Instead of changing where we record
> > costs this patch only adjusts the submission part.
> > 
> > Targets wanting to take advantage of this can implement
> > add_slp_cost and handle all or select cases and resort
> > to the default implementation to get add_stmt_cost events
> > for unhandled groups.
> > 
> > Bootstrapped and tested on x86_64-unknown-linux-gnu.
> > 
> > Any comments?  Would that suit your needs?  For loads/stores
> > my plan was to eventually stuff missing pieces (for costing)
> > into vect_load_store_data of the SLP node (though I somewhat
> > hate exposing that details to targets...)
> 
> This would suit my needs, and help us simplify blocks like
> 
>   if (kind == vec_promote_demote
>       && (assign = dyn_cast <gassign *> (STMT_VINFO_STMT (stmt_info)))
>       && gimple_assign_rhs_code (assign) == FLOAT_EXPR)
>     {
>       auto new_count = count * 2 - m_num_last_promote_demote;
>       m_num_last_promote_demote = count;
>       count = new_count;
>     }
>   else
>     m_num_last_promote_demote = 0;
> 
> where we have to anticipate the expected promotion calls from the
> middle end and scale appropriately so they line up.
> 
> vect_load_store_data does seem a bit like a can of work to expose
> to targets.  Perhaps have a similar vect_load_store_data_for_costing
> and have vect_load_store_data extend that so we can can separate
> out the information we want to pass on for costing from the rest?
> 
> An accessor function/macro could then downcast.


My idea was to introduce select accessor functions, like we have
SLP_TREE_MEMORY_ACCESS_TYPE, so no vect_load_store_data_for_costing
needed.  But yes, basically evolve the API based on uses that
emerge.

> One of the problems I have with load/store costing atm is that for
> gathers/scatters I don't think we cost the instructions to make up
> the address vectors (or at least couldn't find it).
> 
> So with something like
> 
> void f(int x, int a, int b, float c,
>                  float *restrict d, float *restrict e,
>                  int f, int g)
> {
>   for (int h = x; h; h++) {
>     float i = 0;
>     for (int j = 0; j < a; ++j) {
>       float k = f ?: j, l = g ? d[h * j] : d[j * b + h];
>       i += k * l;
>     }
>     e[h] = c * i;
>   }
> }
> 
> We end up thinking that
> 
>         mla     z23.s, p5/m, z5.s, z7.s
>         ld1w    z3.s, p6/z, [x3, z23.s, sxtw 2]
> 
> is cheaper than it is. It would be nice to determine
> such information I wouldn't have to look at the DRs
> in target costing.

Hmm, address costing is part of the "instruction".  Computation
of the offset vector is in separate SLP nodes.  For emulated
gather/scatter the vectorizer emits vec_to_scalar (for offset),
scalar_load/store (the gather/scatter) and vec_to_scalar (for
data and store) or vec_construct (for data and load) pieces.

So I'm not sure what you are after.

> But I digress, 
> 
> > 
> >     * tree-vectorizer.h (vector_costs::add_slp_cost): New.
> >     * tree-vectorizer.cc (vector_costs::add_slp_cost): New
> >     default version.
> >     * tree-vect-slp.cc (add_slp_costs): Helper for dispatching
> >     a cost vector in SLP chunks.
> >     (vect_slp_analyze_operations): Adjust.
> >     (li_cost_vec_cmp): Likewise.
> >     (vect_bb_vectorization_profitable_p): Likewise.
> > ---
> >  gcc/tree-vect-slp.cc   | 30 +++++++++++++++++++++++++-----
> >  gcc/tree-vectorizer.cc | 11 +++++++++++
> >  gcc/tree-vectorizer.h  |  5 +++++
> >  3 files changed, 41 insertions(+), 5 deletions(-)
> > 
> > diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> > index 7be8ca03763..54f6ccca6f3 100644
> > --- a/gcc/tree-vect-slp.cc
> > +++ b/gcc/tree-vect-slp.cc
> > @@ -9167,6 +9167,24 @@ vect_slp_prune_covered_roots (slp_tree node,
> > hash_set<stmt_vec_info> &roots,
> >        vect_slp_prune_covered_roots (child, roots, visited);
> >  }
> > 
> > +/* Hand over COST_VEC to the target COSTS grouped by SLP node.  */
> > +
> > +static void
> > +add_slp_costs (vector_costs *costs, stmt_vector_for_cost& cost_vec)
> > +{
> > +  for (unsigned start = 0; start < cost_vec.length ();)
> > +    {
> > +      unsigned end = start;
> > +      while (end + 1 < cost_vec.length ()
> > +        && cost_vec[start].node == cost_vec[end+1].node)
> > +   end++;
> > +      costs->add_slp_cost (cost_vec[start].node,
> > +                      array_slice<stmt_info_for_cost>
> > +                        (cost_vec.begin () + start, end - start + 1));
> > +      start = end + 1;
> > +    }
> > +}
> > +
> 
> This is probably slightly easier to follow if you use end = start + 1?
> That should avoid the +1 bookkeeping everywhere.

Good point.  Will adjust that, re-test and push.

Richard.

> Thanks,
> Tamar
> 
> >  /* Analyze statements in SLP instances of VINFO.  Return true if the
> >     operations are supported. */
> > 
> > @@ -9252,7 +9270,7 @@ vect_slp_analyze_operations (vec_info *vinfo)
> >       i++;
> >       if (loop_vec_info loop_vinfo = dyn_cast<loop_vec_info> (vinfo))
> >         {
> > -         add_stmt_costs (loop_vinfo->vector_costs, &cost_vec);
> > +         add_slp_costs (loop_vinfo->vector_costs, cost_vec);
> >           cost_vec.release ();
> >         }
> >       else
> > @@ -9520,7 +9538,7 @@ vect_bb_slp_scalar_cost (bb_vec_info vinfo,
> >  /* Comparator for the loop-index sorted cost vectors.  */
> > 
> >  static int
> > -li_cost_vec_cmp (const void *a_, const void *b_)
> > +li_cost_vec_cmp (const void *a_, const void *b_, void *)
> >  {
> >    auto *a = (const std::pair<unsigned, stmt_info_for_cost *> *)a_;
> >    auto *b = (const std::pair<unsigned, stmt_info_for_cost *> *)b_;
> > @@ -9610,8 +9628,8 @@ vect_bb_vectorization_profitable_p (bb_vec_info
> > bb_vinfo,
> >     l = gimple_bb (cost->stmt_info->stmt)->loop_father->num;
> >        li_vector_costs.quick_push (std::make_pair (l, cost));
> >      }
> > -  li_scalar_costs.qsort (li_cost_vec_cmp);
> > -  li_vector_costs.qsort (li_cost_vec_cmp);
> > +  li_scalar_costs.stablesort (li_cost_vec_cmp, NULL);
> > +  li_vector_costs.stablesort (li_cost_vec_cmp, NULL);
> > 
> >    /* Now cost the portions individually.  */
> >    unsigned vi = 0;
> > @@ -9651,13 +9669,15 @@ vect_bb_vectorization_profitable_p
> > (bb_vec_info bb_vinfo,
> > 
> >        /* Complete the target-specific vector cost calculation.  */
> >        class vector_costs *vect_target_cost_data = init_cost (bb_vinfo, 
> > false);
> > +      auto_vec<stmt_info_for_cost> tem;
> >        do
> >     {
> > -     add_stmt_cost (vect_target_cost_data, li_vector_costs[vi].second);
> > +     tem.safe_push (*li_vector_costs[vi].second);
> >       vi++;
> >     }
> >        while (vi < li_vector_costs.length ()
> >          && li_vector_costs[vi].first == vl);
> > +      add_slp_costs (vect_target_cost_data, tem);
> >        vect_target_cost_data->finish_cost (scalar_target_cost_data);
> >        vec_prologue_cost = vect_target_cost_data->prologue_cost ();
> >        vec_inside_cost = vect_target_cost_data->body_cost ();
> > diff --git a/gcc/tree-vectorizer.cc b/gcc/tree-vectorizer.cc
> > index 9e71f71fab7..613e4399dc9 100644
> > --- a/gcc/tree-vectorizer.cc
> > +++ b/gcc/tree-vectorizer.cc
> > @@ -1845,6 +1845,17 @@ vector_costs::add_stmt_cost (int count,
> > vect_cost_for_stmt kind,
> >    return record_stmt_cost (stmt_info, where, cost);
> >  }
> > 
> > +unsigned int
> > +vector_costs::add_slp_cost (slp_tree,
> > +                       const array_slice<stmt_info_for_cost> &cost_vec)
> > +{
> > +  unsigned int sum = 0;
> > +  for (auto item : cost_vec)
> > +    sum += add_stmt_cost (item.count, item.kind, item.stmt_info, item.node,
> > +                     item.vectype, item.misalign, item.where);
> > +  return sum;
> > +}
> > +
> >  /* See the comment above the declaration for details.  */
> > 
> >  void
> > diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
> > index de50ed3277c..3a01e1be0f1 100644
> > --- a/gcc/tree-vectorizer.h
> > +++ b/gcc/tree-vectorizer.h
> > @@ -1777,6 +1777,11 @@ public:
> >                                   tree vectype, int misalign,
> >                                   vect_cost_model_location where);
> > 
> > +  /* Update the costs in response to adding costs in V which are all from
> > +     vectorizing NODE to the respective part.  */
> > +  virtual unsigned int add_slp_cost (slp_tree node,
> > +                                const array_slice<stmt_info_for_cost> &v);
> > +
> >    /* Finish calculating the cost of the code.  The results can be
> >       read back using the functions below.
> > 
> > --
> > 2.51.0
> 

-- 
Richard Biener <[email protected]>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Jochen Jaser, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

RE: [PATCH] [RFC][v2] Add vector_costs::add_slp_cost grouping hook

Reply via email to