Re: [PATCH] middle-end/117801 - failed register coalescing due to GIMPLE schedule

Richard Biener Thu, 05 Dec 2024 05:39:22 -0800

On Thu, 5 Dec 2024, Richard Sandiford wrote:

> Richard Biener <rguent...@suse.de> writes:
> > On Fri, 29 Nov 2024, Jakub Jelinek wrote:
> >
> >> On Fri, Nov 29, 2024 at 09:19:55AM +0100, Richard Biener wrote:
> >> > For a TSVC testcase we see failed register coalescing due to a
> >> > different schedule of GIMPLE .FMA and stores fed by it.  This
> >> > can be mitigated by making direct internal functions participate
> >> > in TER - given we're using more and more of such functions to
> >> > expose target capabilities it seems to be a natural thing to not
> >> > exempt those.
> >> > 
> >> > Unfortunately the internal function expanding API doesn't match
> >> > what we usually have - passing in a target and returning an RTX
> >> > but instead the LHS of the call is expanded and written to.  This
> >> > makes the TER expansion of a call SSA def a bit unwieldly.
> >> 
> >> Can't we change that?
> >> Especially if it is only for the easiest subset of internal fns
> >> (I see you limit it only to direct_internal_fn_p), if it has just
> >> one or a couple of easy implementations, those could be split into
> >> one which handles the whole thing by just expanding lhs and calling
> >> another function with the rtx target argument into which to store
> >> stuff (or const0_rtx for ignored result?) and handle the actual expansion,
> >> and then have an exported function from internal-fn.cc which expr.cc
> >> could call for the TERed internal-fn case.
> >> That function could assert it is only direct_internal_fn_p or some
> >> other subset which it would handle.
> >
> > The expander goes through macro-generated expand_FOO (see top of
> > internal-fn.cc), and in the end dispatches to expand_*_optab_fn
> > of which there is a generic one for UNARY, BINARY and TERNARY
> > but very many OPTAB_NAME variants, like expand_fold_len_extract_optab_fn
> > dispatching to expand_direct_optab_fn or complex ones like
> > expand_gather_load_optab_fn.  There's unfortunately no good way
> > to factor out a different API there, at least not easily.
> 
> Converting everything to a new, single API seems feasible as
> far as the refactoring goes.  But then I wonder what the API
> should be.
> 
> One option would be:
> 
>   void expand_FOO (internal_fn fn, gcall *call, rtx lhs_rtx)
> 
> where the rtx is provided by the caller and is nonnull whenever
> the lhs is needed.  (This is different from the usual "target"
> argument.)  For this option, the dispatcher would use a single:
> 
>   lhs_rtx = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
> 
> when no lhs is provided by the caller.
> 
> But this option assumes that the store to the lhs can always be done
> on rtl, with (for example) none of the special casing in expand_assignment.
> That's probably true now, but it would be a difficult assumption to
> reverse later.
> 
> Another option would be the usual:
> 
>   rtx expand_FOO (internal_fn fn, gcall *call, rtx target)
> 
> which returns the result as an rtx, with TARGET just being
> a suggestion.  But as discussed, this doesn't really fit the
> semantics of things like LOAD_LANES and STORE_LANES very well,
> where the lhs is an aggregate.
> 
> More generally, I suppose the question with this option is: when
> can the expander ignore the call's lhs?  Always, with the caller
> handling the fallout?  When target is nonnull?  When the lhs is
> an SSA_NAME?  Or something else?  The first option complicates
> callers, and doesn't fit LOAD/STORE_LANES well, whereas the others
> effectively create two cases for each function.
> 
> IMO, the current interface is the natural one for its current callers
> (both in cfgexpand).  And if TER is something that we want to remove,
> maybe it's not worth changing everything over to a different interface
> just for that.


Exactly the questions I asked myself btw. - for consistency in the
place I had to add IFN expansion the only useful new API would be
the classic with a target hint and a returned lhs RTX.

Given we're dealing with optabs the optab (aka the call) does
deal with operands[0], it's target.  I think in general we can't
do without any of the expand_assignment handling though, but I
think that's on the burden of the caller who provides 'target'
in case that target is ignored - it would need to deal with
moving the result RTX to 'target' somehow.

> So having played around with a few possibilities locally, I've kind-of
> grown to like the patch :)

It seems to work, though I have no high hopes on test coverage from
bootstrap & regtest.  The precommit CI only managed to test on arm,
aarch64 only built.

If people are fine I'd push today or early tomorrow and will revert
on any fallout early next week.

Does that sound OK?

Thanks,
Richard.

Re: [PATCH] middle-end/117801 - failed register coalescing due to GIMPLE schedule

Reply via email to