Richard Biener <rguent...@suse.de> writes: > On Thu, 5 Dec 2024, Richard Sandiford wrote: > >> Richard Biener <rguent...@suse.de> writes: >> > On Fri, 29 Nov 2024, Jakub Jelinek wrote: >> > >> >> On Fri, Nov 29, 2024 at 09:19:55AM +0100, Richard Biener wrote: >> >> > For a TSVC testcase we see failed register coalescing due to a >> >> > different schedule of GIMPLE .FMA and stores fed by it. This >> >> > can be mitigated by making direct internal functions participate >> >> > in TER - given we're using more and more of such functions to >> >> > expose target capabilities it seems to be a natural thing to not >> >> > exempt those. >> >> > >> >> > Unfortunately the internal function expanding API doesn't match >> >> > what we usually have - passing in a target and returning an RTX >> >> > but instead the LHS of the call is expanded and written to. This >> >> > makes the TER expansion of a call SSA def a bit unwieldly. >> >> >> >> Can't we change that? >> >> Especially if it is only for the easiest subset of internal fns >> >> (I see you limit it only to direct_internal_fn_p), if it has just >> >> one or a couple of easy implementations, those could be split into >> >> one which handles the whole thing by just expanding lhs and calling >> >> another function with the rtx target argument into which to store >> >> stuff (or const0_rtx for ignored result?) and handle the actual expansion, >> >> and then have an exported function from internal-fn.cc which expr.cc >> >> could call for the TERed internal-fn case. >> >> That function could assert it is only direct_internal_fn_p or some >> >> other subset which it would handle. >> > >> > The expander goes through macro-generated expand_FOO (see top of >> > internal-fn.cc), and in the end dispatches to expand_*_optab_fn >> > of which there is a generic one for UNARY, BINARY and TERNARY >> > but very many OPTAB_NAME variants, like expand_fold_len_extract_optab_fn >> > dispatching to expand_direct_optab_fn or complex ones like >> > expand_gather_load_optab_fn. There's unfortunately no good way >> > to factor out a different API there, at least not easily. >> >> Converting everything to a new, single API seems feasible as >> far as the refactoring goes. But then I wonder what the API >> should be. >> >> One option would be: >> >> void expand_FOO (internal_fn fn, gcall *call, rtx lhs_rtx) >> >> where the rtx is provided by the caller and is nonnull whenever >> the lhs is needed. (This is different from the usual "target" >> argument.) For this option, the dispatcher would use a single: >> >> lhs_rtx = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE); >> >> when no lhs is provided by the caller. >> >> But this option assumes that the store to the lhs can always be done >> on rtl, with (for example) none of the special casing in expand_assignment. >> That's probably true now, but it would be a difficult assumption to >> reverse later. >> >> Another option would be the usual: >> >> rtx expand_FOO (internal_fn fn, gcall *call, rtx target) >> >> which returns the result as an rtx, with TARGET just being >> a suggestion. But as discussed, this doesn't really fit the >> semantics of things like LOAD_LANES and STORE_LANES very well, >> where the lhs is an aggregate. >> >> More generally, I suppose the question with this option is: when >> can the expander ignore the call's lhs? Always, with the caller >> handling the fallout? When target is nonnull? When the lhs is >> an SSA_NAME? Or something else? The first option complicates >> callers, and doesn't fit LOAD/STORE_LANES well, whereas the others >> effectively create two cases for each function. >> >> IMO, the current interface is the natural one for its current callers >> (both in cfgexpand). And if TER is something that we want to remove, >> maybe it's not worth changing everything over to a different interface >> just for that. > > Exactly the questions I asked myself btw. - for consistency in the > place I had to add IFN expansion the only useful new API would be > the classic with a target hint and a returned lhs RTX. > > Given we're dealing with optabs the optab (aka the call) does > deal with operands[0], it's target. I think in general we can't > do without any of the expand_assignment handling though, but I > think that's on the burden of the caller who provides 'target' > in case that target is ignored - it would need to deal with > moving the result RTX to 'target' somehow. > >> So having played around with a few possibilities locally, I've kind-of >> grown to like the patch :) > > It seems to work, though I have no high hopes on test coverage from > bootstrap & regtest. The precommit CI only managed to test on arm, > aarch64 only built. > > If people are fine I'd push today or early tomorrow and will revert > on any fallout early next week. > > Does that sound OK?
Yeah, sounds good to me FWIW. Thanks, Richard