On Wed, 13 May 2026, Tamar Christina wrote:
> > -----Original Message-----
> > From: Richard Biener <[email protected]>
> > Sent: 06 May 2026 14:03
> > To: [email protected]
> > Cc: Tamar Christina <[email protected]>
> > Subject: [PATCH] [RFC][v2] Add vector_costs::add_slp_cost grouping hook
> >
> > The following simplifies the earlier RFC for making it easier
> > for the target to correlate multiple cost events created from
> > a single SLP operation. Instead of changing where we record
> > costs this patch only adjusts the submission part.
> >
> > Targets wanting to take advantage of this can implement
> > add_slp_cost and handle all or select cases and resort
> > to the default implementation to get add_stmt_cost events
> > for unhandled groups.
> >
> > Bootstrapped and tested on x86_64-unknown-linux-gnu.
> >
> > Any comments? Would that suit your needs? For loads/stores
> > my plan was to eventually stuff missing pieces (for costing)
> > into vect_load_store_data of the SLP node (though I somewhat
> > hate exposing that details to targets...)
>
> This would suit my needs, and help us simplify blocks like
>
> if (kind == vec_promote_demote
> && (assign = dyn_cast <gassign *> (STMT_VINFO_STMT (stmt_info)))
> && gimple_assign_rhs_code (assign) == FLOAT_EXPR)
> {
> auto new_count = count * 2 - m_num_last_promote_demote;
> m_num_last_promote_demote = count;
> count = new_count;
> }
> else
> m_num_last_promote_demote = 0;
>
> where we have to anticipate the expected promotion calls from the
> middle end and scale appropriately so they line up.
>
> vect_load_store_data does seem a bit like a can of work to expose
> to targets. Perhaps have a similar vect_load_store_data_for_costing
> and have vect_load_store_data extend that so we can can separate
> out the information we want to pass on for costing from the rest?
>
> An accessor function/macro could then downcast.
My idea was to introduce select accessor functions, like we have
SLP_TREE_MEMORY_ACCESS_TYPE, so no vect_load_store_data_for_costing
needed. But yes, basically evolve the API based on uses that
emerge.
> One of the problems I have with load/store costing atm is that for
> gathers/scatters I don't think we cost the instructions to make up
> the address vectors (or at least couldn't find it).
>
> So with something like
>
> void f(int x, int a, int b, float c,
> float *restrict d, float *restrict e,
> int f, int g)
> {
> for (int h = x; h; h++) {
> float i = 0;
> for (int j = 0; j < a; ++j) {
> float k = f ?: j, l = g ? d[h * j] : d[j * b + h];
> i += k * l;
> }
> e[h] = c * i;
> }
> }
>
> We end up thinking that
>
> mla z23.s, p5/m, z5.s, z7.s
> ld1w z3.s, p6/z, [x3, z23.s, sxtw 2]
>
> is cheaper than it is. It would be nice to determine
> such information I wouldn't have to look at the DRs
> in target costing.
Hmm, address costing is part of the "instruction". Computation
of the offset vector is in separate SLP nodes. For emulated
gather/scatter the vectorizer emits vec_to_scalar (for offset),
scalar_load/store (the gather/scatter) and vec_to_scalar (for
data and store) or vec_construct (for data and load) pieces.
So I'm not sure what you are after.
> But I digress,
>
> >
> > * tree-vectorizer.h (vector_costs::add_slp_cost): New.
> > * tree-vectorizer.cc (vector_costs::add_slp_cost): New
> > default version.
> > * tree-vect-slp.cc (add_slp_costs): Helper for dispatching
> > a cost vector in SLP chunks.
> > (vect_slp_analyze_operations): Adjust.
> > (li_cost_vec_cmp): Likewise.
> > (vect_bb_vectorization_profitable_p): Likewise.
> > ---
> > gcc/tree-vect-slp.cc | 30 +++++++++++++++++++++++++-----
> > gcc/tree-vectorizer.cc | 11 +++++++++++
> > gcc/tree-vectorizer.h | 5 +++++
> > 3 files changed, 41 insertions(+), 5 deletions(-)
> >
> > diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> > index 7be8ca03763..54f6ccca6f3 100644
> > --- a/gcc/tree-vect-slp.cc
> > +++ b/gcc/tree-vect-slp.cc
> > @@ -9167,6 +9167,24 @@ vect_slp_prune_covered_roots (slp_tree node,
> > hash_set<stmt_vec_info> &roots,
> > vect_slp_prune_covered_roots (child, roots, visited);
> > }
> >
> > +/* Hand over COST_VEC to the target COSTS grouped by SLP node. */
> > +
> > +static void
> > +add_slp_costs (vector_costs *costs, stmt_vector_for_cost& cost_vec)
> > +{
> > + for (unsigned start = 0; start < cost_vec.length ();)
> > + {
> > + unsigned end = start;
> > + while (end + 1 < cost_vec.length ()
> > + && cost_vec[start].node == cost_vec[end+1].node)
> > + end++;
> > + costs->add_slp_cost (cost_vec[start].node,
> > + array_slice<stmt_info_for_cost>
> > + (cost_vec.begin () + start, end - start + 1));
> > + start = end + 1;
> > + }
> > +}
> > +
>
> This is probably slightly easier to follow if you use end = start + 1?
> That should avoid the +1 bookkeeping everywhere.
Good point. Will adjust that, re-test and push.
Richard.
> Thanks,
> Tamar
>
> > /* Analyze statements in SLP instances of VINFO. Return true if the
> > operations are supported. */
> >
> > @@ -9252,7 +9270,7 @@ vect_slp_analyze_operations (vec_info *vinfo)
> > i++;
> > if (loop_vec_info loop_vinfo = dyn_cast<loop_vec_info> (vinfo))
> > {
> > - add_stmt_costs (loop_vinfo->vector_costs, &cost_vec);
> > + add_slp_costs (loop_vinfo->vector_costs, cost_vec);
> > cost_vec.release ();
> > }
> > else
> > @@ -9520,7 +9538,7 @@ vect_bb_slp_scalar_cost (bb_vec_info vinfo,
> > /* Comparator for the loop-index sorted cost vectors. */
> >
> > static int
> > -li_cost_vec_cmp (const void *a_, const void *b_)
> > +li_cost_vec_cmp (const void *a_, const void *b_, void *)
> > {
> > auto *a = (const std::pair<unsigned, stmt_info_for_cost *> *)a_;
> > auto *b = (const std::pair<unsigned, stmt_info_for_cost *> *)b_;
> > @@ -9610,8 +9628,8 @@ vect_bb_vectorization_profitable_p (bb_vec_info
> > bb_vinfo,
> > l = gimple_bb (cost->stmt_info->stmt)->loop_father->num;
> > li_vector_costs.quick_push (std::make_pair (l, cost));
> > }
> > - li_scalar_costs.qsort (li_cost_vec_cmp);
> > - li_vector_costs.qsort (li_cost_vec_cmp);
> > + li_scalar_costs.stablesort (li_cost_vec_cmp, NULL);
> > + li_vector_costs.stablesort (li_cost_vec_cmp, NULL);
> >
> > /* Now cost the portions individually. */
> > unsigned vi = 0;
> > @@ -9651,13 +9669,15 @@ vect_bb_vectorization_profitable_p
> > (bb_vec_info bb_vinfo,
> >
> > /* Complete the target-specific vector cost calculation. */
> > class vector_costs *vect_target_cost_data = init_cost (bb_vinfo,
> > false);
> > + auto_vec<stmt_info_for_cost> tem;
> > do
> > {
> > - add_stmt_cost (vect_target_cost_data, li_vector_costs[vi].second);
> > + tem.safe_push (*li_vector_costs[vi].second);
> > vi++;
> > }
> > while (vi < li_vector_costs.length ()
> > && li_vector_costs[vi].first == vl);
> > + add_slp_costs (vect_target_cost_data, tem);
> > vect_target_cost_data->finish_cost (scalar_target_cost_data);
> > vec_prologue_cost = vect_target_cost_data->prologue_cost ();
> > vec_inside_cost = vect_target_cost_data->body_cost ();
> > diff --git a/gcc/tree-vectorizer.cc b/gcc/tree-vectorizer.cc
> > index 9e71f71fab7..613e4399dc9 100644
> > --- a/gcc/tree-vectorizer.cc
> > +++ b/gcc/tree-vectorizer.cc
> > @@ -1845,6 +1845,17 @@ vector_costs::add_stmt_cost (int count,
> > vect_cost_for_stmt kind,
> > return record_stmt_cost (stmt_info, where, cost);
> > }
> >
> > +unsigned int
> > +vector_costs::add_slp_cost (slp_tree,
> > + const array_slice<stmt_info_for_cost> &cost_vec)
> > +{
> > + unsigned int sum = 0;
> > + for (auto item : cost_vec)
> > + sum += add_stmt_cost (item.count, item.kind, item.stmt_info, item.node,
> > + item.vectype, item.misalign, item.where);
> > + return sum;
> > +}
> > +
> > /* See the comment above the declaration for details. */
> >
> > void
> > diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
> > index de50ed3277c..3a01e1be0f1 100644
> > --- a/gcc/tree-vectorizer.h
> > +++ b/gcc/tree-vectorizer.h
> > @@ -1777,6 +1777,11 @@ public:
> > tree vectype, int misalign,
> > vect_cost_model_location where);
> >
> > + /* Update the costs in response to adding costs in V which are all from
> > + vectorizing NODE to the respective part. */
> > + virtual unsigned int add_slp_cost (slp_tree node,
> > + const array_slice<stmt_info_for_cost> &v);
> > +
> > /* Finish calculating the cost of the code. The results can be
> > read back using the functions below.
> >
> > --
> > 2.51.0
>
--
Richard Biener <[email protected]>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Jochen Jaser, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)