Re: [PATCH] middle-end/117801 - failed register coalescing due to GIMPLE schedule

Richard Sandiford Thu, 05 Dec 2024 06:39:17 -0800

Richard Biener <rguent...@suse.de> writes:
> On Thu, 5 Dec 2024, Richard Sandiford wrote:
>
>> Richard Biener <rguent...@suse.de> writes:
>> > On Fri, 29 Nov 2024, Jakub Jelinek wrote:
>> >
>> >> On Fri, Nov 29, 2024 at 09:19:55AM +0100, Richard Biener wrote:
>> >> > For a TSVC testcase we see failed register coalescing due to a
>> >> > different schedule of GIMPLE .FMA and stores fed by it.  This
>> >> > can be mitigated by making direct internal functions participate
>> >> > in TER - given we're using more and more of such functions to
>> >> > expose target capabilities it seems to be a natural thing to not
>> >> > exempt those.
>> >> > 
>> >> > Unfortunately the internal function expanding API doesn't match
>> >> > what we usually have - passing in a target and returning an RTX
>> >> > but instead the LHS of the call is expanded and written to.  This
>> >> > makes the TER expansion of a call SSA def a bit unwieldly.
>> >> 
>> >> Can't we change that?
>> >> Especially if it is only for the easiest subset of internal fns
>> >> (I see you limit it only to direct_internal_fn_p), if it has just
>> >> one or a couple of easy implementations, those could be split into
>> >> one which handles the whole thing by just expanding lhs and calling
>> >> another function with the rtx target argument into which to store
>> >> stuff (or const0_rtx for ignored result?) and handle the actual expansion,
>> >> and then have an exported function from internal-fn.cc which expr.cc
>> >> could call for the TERed internal-fn case.
>> >> That function could assert it is only direct_internal_fn_p or some
>> >> other subset which it would handle.
>> >
>> > The expander goes through macro-generated expand_FOO (see top of
>> > internal-fn.cc), and in the end dispatches to expand_*_optab_fn
>> > of which there is a generic one for UNARY, BINARY and TERNARY
>> > but very many OPTAB_NAME variants, like expand_fold_len_extract_optab_fn
>> > dispatching to expand_direct_optab_fn or complex ones like
>> > expand_gather_load_optab_fn.  There's unfortunately no good way
>> > to factor out a different API there, at least not easily.
>> 
>> Converting everything to a new, single API seems feasible as
>> far as the refactoring goes.  But then I wonder what the API
>> should be.
>> 
>> One option would be:
>> 
>>   void expand_FOO (internal_fn fn, gcall *call, rtx lhs_rtx)
>> 
>> where the rtx is provided by the caller and is nonnull whenever
>> the lhs is needed.  (This is different from the usual "target"
>> argument.)  For this option, the dispatcher would use a single:
>> 
>>   lhs_rtx = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
>> 
>> when no lhs is provided by the caller.
>> 
>> But this option assumes that the store to the lhs can always be done
>> on rtl, with (for example) none of the special casing in expand_assignment.
>> That's probably true now, but it would be a difficult assumption to
>> reverse later.
>> 
>> Another option would be the usual:
>> 
>>   rtx expand_FOO (internal_fn fn, gcall *call, rtx target)
>> 
>> which returns the result as an rtx, with TARGET just being
>> a suggestion.  But as discussed, this doesn't really fit the
>> semantics of things like LOAD_LANES and STORE_LANES very well,
>> where the lhs is an aggregate.
>> 
>> More generally, I suppose the question with this option is: when
>> can the expander ignore the call's lhs?  Always, with the caller
>> handling the fallout?  When target is nonnull?  When the lhs is
>> an SSA_NAME?  Or something else?  The first option complicates
>> callers, and doesn't fit LOAD/STORE_LANES well, whereas the others
>> effectively create two cases for each function.
>> 
>> IMO, the current interface is the natural one for its current callers
>> (both in cfgexpand).  And if TER is something that we want to remove,
>> maybe it's not worth changing everything over to a different interface
>> just for that.
>
> Exactly the questions I asked myself btw. - for consistency in the
> place I had to add IFN expansion the only useful new API would be
> the classic with a target hint and a returned lhs RTX.
>
> Given we're dealing with optabs the optab (aka the call) does
> deal with operands[0], it's target.  I think in general we can't
> do without any of the expand_assignment handling though, but I
> think that's on the burden of the caller who provides 'target'
> in case that target is ignored - it would need to deal with
> moving the result RTX to 'target' somehow.
>
>> So having played around with a few possibilities locally, I've kind-of
>> grown to like the patch :)
>
> It seems to work, though I have no high hopes on test coverage from
> bootstrap & regtest.  The precommit CI only managed to test on arm,
> aarch64 only built.
>
> If people are fine I'd push today or early tomorrow and will revert
> on any fallout early next week.
>
> Does that sound OK?


Yeah, sounds good to me FWIW.

Thanks,
Richard

Re: [PATCH] middle-end/117801 - failed register coalescing due to GIMPLE schedule

Reply via email to