On Fri, Nov 13, 2015 at 2:12 PM, Richard Sandiford <richard.sandif...@arm.com> wrote: > Richard Biener <richard.guent...@gmail.com> writes: >> On Mon, Nov 9, 2015 at 10:03 PM, Michael Matz <m...@suse.de> wrote: >>> Hi, >>> >>> On Mon, 9 Nov 2015, Richard Sandiford wrote: >>> >>>> +static bool >>>> +can_use_internal_fn (gcall *call) >>>> +{ >>>> + /* Only replace calls that set errno. */ >>>> + if (!gimple_vdef (call)) >>>> + return false; >>> >>> Oh, I managed to confuse this in my head while reading the patch. So, >>> hmm, you don't actually replace the builtin with an internal function >>> (without the condition) under no-errno-math? Does something else do that? >>> Because otherwise that seems an unnecessary restriction? >>> >>>> >> r229916 fixed that for the non-EH case. >>>> > >>>> > Ah, missed it. Even the EH case shouldn't be difficult. If the >>>> > original dominator of the EH destination was the call block it moves, >>>> > otherwise it remains unchanged. >>>> >>>> The target of the edge is easy in itself, I agree, but that isn't >>>> necessarily the only affected block, if the EH handler doesn't >>>> exit or rethrow. >>> >>> You're worried the non-EH and the EH regions merge again, right? Like so: >>> >>> before change: >>> >>> BB1: throwing-call >>> fallthru/ \EH >>> BB2 BBeh >>> | /....\ (stuff in EH-region) >>> | / some path out of EH region >>> | /------------------/ >>> BB3 >>> >>> Here, BB3 must at least be dominated by BB1 (the throwing block), or by >>> something further up (when there are other side-entries to the path >>> BB2->BB3 or into the EH region). When further up, nothing changes, when >>> it's BB1, then it's afterwards dominated by the BB containing the >>> condition. So everything with idom==BB1 gets idom=Bcond, except for BBeh, >>> which gets idom=Bcall. Depending on how you split BB1, either Bcond or >>> BBcall might still be BB1 and doesn't lead to changes in the dom tree. >>> >>>> > Currently we have quite some of such passes (reassoc, forwprop, >>>> > lower_vector_ssa, cse_reciprocals, cse_sincos (sigh!), optimize_bswap >>>> > and others), but they are all handling only special situations in one >>>> > way or the other. pass_fold_builtins is another one, but it seems >>>> > most related to what you want (replacing a call with something else), >>>> > so I thought that'd be the natural choice. >>>> >>>> Well, to be pedantic, it's not really replacing the call. Except for >>>> the special case of targets that support direct assignments to errno, >>>> it keeps the original call but ensures that it isn't usually executed. >>>> From that point of view it doesn't really seem like a fold. >>>> >>>> But I suppose that's just naming again :-). And it's easily solved with >>>> s/fold/rewrite/. >>> >>> Exactly, in my mind pass_fold_builtin (like many of the others I >>> mentioned) doesn't do folding but rewriting :) >> >> So I am replying here to the issue of where to do the transform call_cdce >> does and the one Richard wants to add. For example we "lower" >> posix_memalign as early as GIMPLE lowering (that's before CFG construction). >> We also lower sincos to cexpi during GENERIC folding (or if that is dropped >> either GIMPLE lowering or GIMPLE folding during gimplification would be >> appropriate). >> >> Now, with offloading we have to avoid creating target dependencies before >> LTO stream-out (thus no IFN replacements before that - not sure if >> Richards patches have an issue there already). > > No, this patch was the earliest point at which we converted to internal > functions. The idea was to make code treat ECF_PURE built-in functions > and internal functions as being basically equivalent. There's therefore > not much benefit to doing a straight replacement of one with the other > during either GENERIC or gimple. Instead the series only used internal > functions for things that built-in functions couldn't do, specifically: > > - the case used in this patch, to optimise part of a non-pure built-in > function using a pure equivalent. > > - vector versions of built-in functions. > > The cfgexpand patch makes sure that pure built-in functions are expanded > like internal functions where possible. > >> Which would leave us with a lowering stage early in the main >> optimization pipeline - I think fold_builtins pass is way too late but >> any "folding" pass will do (like forwprop or backprop where the latter >> might be better because it might end up computing FP "ranges" to >> improve the initial lowering code). > > This isn't at all related to what backprop is doing though. > backprop is about optimising definitions based on information > about all uses. > > Does fold_builtins need to be as late as it is?
Not sure. It folds remaining __builtin_constant_p stuff for example. >> Of course call_cdce is as good as long as it still exists. > > Does this meann that you're not against the patch in principle > (i.e. keeping call_cdce for now and extending it in the way that > this patch does)? Yes, I'm fine with extending call_cdce. Of course I'd happily approve a patch dissolving it into somewhere where it makes more sense. But this shouldn't block this patch. Richard. > > Thanks, > Richard >