> -----Original Message-----
> From: Richard Biener <[email protected]>
> Sent: 15 May 2025 07:48
> To: Tamar Christina <[email protected]>
> Cc: Richard Sandiford <[email protected]>; gcc-
> [email protected]
> Subject: RE: [PATCH][RFC] Add vector_costs::add_vector_cost vector stmt
> grouping hook
> 
> On Wed, 14 May 2025, Tamar Christina wrote:
> 
> > > -----Original Message-----
> > > From: Richard Biener <[email protected]>
> > > Sent: Tuesday, May 13, 2025 12:08 PM
> > > To: Richard Sandiford <[email protected]>
> > > Cc: [email protected]; Tamar Christina
> <[email protected]>
> > > Subject: Re: [PATCH][RFC] Add vector_costs::add_vector_cost vector stmt
> > > grouping hook
> > >
> > > On Tue, 13 May 2025, Richard Sandiford wrote:
> > >
> > > > Richard Biener <[email protected]> writes:
> > > > > The following refactors the vectorizer vector_costs target API
> > > > > to add a new vector_costs::add_vector_cost entry which groups
> > > > > all individual sub-stmts we create per "vector stmt", aka SLP
> > > > > node.  This allows for the targets to more easily match on
> > > > > complex cases like emulated gather/scatter or even just vector
> > > > > construction.
> > > > >
> > > > > The patch itself is just a prototype and leaves out BB vectorization
> > > > > for simplicity.  It also does not fully group all vector stmts
> > > > > but leaves some bare add_stmt_cost hook invocations.  I'd expect
> > > > > the add_stmt_hook to be still used for scalar stmt costing and
> > > > > for costing added branching around prologue/epilogue.  The
> > > > > default implementation of add_vector_cost just dispatches to
> > > > > add_stmt_cost for individual stmts.  Eventually the actual data
> > > > > we track for the combined costing will diverge (no need to track
> > > > > SLP node or stmt_info there?), so targets would eventually be
> > > > > expected to implement both hooks and splice out common workers
> > > > > to deal with "missing" information coming in from the different
> > > > > entries.
> > > > >
> > > > > This should eventually baby-step us towards the generic vectorizer
> > > > > code being able to compute and compare latency and resource
> > > > > utilization throughout the scalar / vector loop iteration based
> > > > > on latency and throughput data determined on a stmt-by-stmt base
> > > > > from the target.  As given the grouping should be an incremental
> > > > > improvement, but I have not tried to see how it can simplify
> > > > > the x86 hook implementation - I've been triggered by the aarch64
> > > > > reported bootstrap fail on the cleanup RFC I posted given that
> > > > > code wants to identify a scalar load that's costed as part of
> > > > > a gather/scatter operation.
> > > > >
> > > > > Any comments or problems you forsee?
> > > >
> > > > Could the stmt_vector_for_cost pointer instead be passed to
> > > > TARGET_VECTORIZE_CREATE_COSTS?  The danger with passing it to
> > > > add_vector_cost is that the same vector_costs instance might get used
> > > > for multiple different costing attempts, so that only the provided
> > > > stmt_vector_for_costs are specific to the current costing attempt.
> > > > But for complex cases, the target's vector_costs should be able
> > > > to cache its own target-specific information, with the same
> > > > lifetime/scope as the stmt_vector_for_costs.
> > >
> > > It cannot be passed to TARGET_VECTORIZE_CREATE_COSTS - but I can
> > > not pass it at all, in the proposed implementation it is
> > > actually node->cost_vec.  It's the set of stmts we cost for
> > > a single SLP node.  I'm not sure the "group" is what targets
> > > would cache, they'd rather cache whatever they make from the
> > > group and its contents?
> > >
> > > That said, the most aggressive way of handling it would be
> > > to defer everything to the target and just pass in the
> > > set of SLP instances to TARGET_VECTORIZE_CREATE_COSTS and
> > > not perform any individual add_stmt_cost calls at all, but expect
> > > the target to walk the SLP graph at finish_cost () time.
> > >
> >
> > I was actually wondering whether it wouldn't be indeed better to cost
> > the slp_instances as those contain roots that would need to be costed
> > too.
> 
> Yes, I need to think about that.  But it's also that in practice
> BB vectorization costing will work quite differently from loop
> costing since for BB vectorization there's no implicit unrolling
> and you have to think about surrounding stmts.
> 
> > For early break if we're costing purely based on SLP node then the
> > actual break itself can't be costed as it's not in the node.  We'd need
> > this to be able to do this to be able to re-order the exits during slp
> > scheduling based on their actual cost.
> 
> Note the proposed prototype patch still gets you add_stmt_cost
> hook calls for the non-SLP stmts, it's just an easy way to
> let the target know that costed sub-stmts belong to the same
> SLP tree.
> 
> I'll put this on the side for now.

I've hit a few cases now where such a change would have been useful.
The one I've most recently hit was costing of LOAD_LANES with gaps
and gatther/scatter addressing.

I think this patch was a step in the right direction, at least it would enable
targets not to try to "match up" individual costing calls back to a group.

Thanks,
Tamar

> 
> Richard.
> 
> >
> > Cheers,
> > Tamar
> >
> > > The x86 target currently keeps counters of certain ops but
> > > does not cache the full-blown stmts from add_stmt_cost for
> > > computing the overall cost at finish_cost.  I'll have to look
> > > what aarch64 does here.
> > >
> > > Ultimatively I'd like to take into account stmt dependences
> > > during costing - at the moment we are asking the target to
> > > compute per stmt "latencies" but then we just sum those.
> > > One improvement would be to compute the max latency through
> > > the graph and the maximum width (without having throughput
> > > or port assignments and an actual scheduler implementation).
> > >
> > > Richard.
> > >
> > > >
> > > > Thanks,
> > > > Richard
> > > >
> > >
> > > --
> > > Richard Biener <[email protected]>
> > > SUSE Software Solutions Germany GmbH,
> > > Frankenstrasse 146, 90461 Nuernberg, Germany;
> > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> Nuernberg)
> >
> 
> --
> Richard Biener <[email protected]>
> SUSE Software Solutions Germany GmbH,
> Frankenstrasse 146, 90461 Nuernberg, Germany;
> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> Nuernberg)

Reply via email to