Re: Extend tree-call-cdce to calls whose result is used

Richard Biener Mon, 16 Nov 2015 06:02:11 -0800

On Fri, Nov 13, 2015 at 2:12 PM, Richard Sandiford
<richard.sandif...@arm.com> wrote:
> Richard Biener <richard.guent...@gmail.com> writes:
>> On Mon, Nov 9, 2015 at 10:03 PM, Michael Matz <m...@suse.de> wrote:
>>> Hi,
>>>
>>> On Mon, 9 Nov 2015, Richard Sandiford wrote:
>>>
>>>> +static bool
>>>> +can_use_internal_fn (gcall *call)
>>>> +{
>>>> +  /* Only replace calls that set errno.  */
>>>> +  if (!gimple_vdef (call))
>>>> +    return false;
>>>
>>> Oh, I managed to confuse this in my head while reading the patch.  So,
>>> hmm, you don't actually replace the builtin with an internal function
>>> (without the condition) under no-errno-math?  Does something else do that?
>>> Because otherwise that seems an unnecessary restriction?
>>>
>>>> >> r229916 fixed that for the non-EH case.
>>>> >
>>>> > Ah, missed it.  Even the EH case shouldn't be difficult.  If the
>>>> > original dominator of the EH destination was the call block it moves,
>>>> > otherwise it remains unchanged.
>>>>
>>>> The target of the edge is easy in itself, I agree, but that isn't
>>>> necessarily the only affected block, if the EH handler doesn't
>>>> exit or rethrow.
>>>
>>> You're worried the non-EH and the EH regions merge again, right?  Like so:
>>>
>>> before change:
>>>
>>>         BB1: throwing-call
>>>      fallthru/           \EH
>>>     BB2                   BBeh
>>>      |                   /....\ (stuff in EH-region)
>>>      |                     /    some path out of EH region
>>>      | /------------------/
>>>     BB3
>>>
>>> Here, BB3 must at least be dominated by BB1 (the throwing block), or by
>>> something further up (when there are other side-entries to the path
>>> BB2->BB3 or into the EH region).  When further up, nothing changes, when
>>> it's BB1, then it's afterwards dominated by the BB containing the
>>> condition.  So everything with idom==BB1 gets idom=Bcond, except for BBeh,
>>> which gets idom=Bcall.  Depending on how you split BB1, either Bcond or
>>> BBcall might still be BB1 and doesn't lead to changes in the dom tree.
>>>
>>>> > Currently we have quite some of such passes (reassoc, forwprop,
>>>> > lower_vector_ssa, cse_reciprocals, cse_sincos (sigh!), optimize_bswap
>>>> > and others), but they are all handling only special situations in one
>>>> > way or the other.  pass_fold_builtins is another one, but it seems
>>>> > most related to what you want (replacing a call with something else),
>>>> > so I thought that'd be the natural choice.
>>>>
>>>> Well, to be pedantic, it's not really replacing the call.  Except for
>>>> the special case of targets that support direct assignments to errno,
>>>> it keeps the original call but ensures that it isn't usually executed.
>>>> From that point of view it doesn't really seem like a fold.
>>>>
>>>> But I suppose that's just naming again :-).  And it's easily solved with
>>>> s/fold/rewrite/.
>>>
>>> Exactly, in my mind pass_fold_builtin (like many of the others I
>>> mentioned) doesn't do folding but rewriting :)
>>
>> So I am replying here to the issue of where to do the transform call_cdce
>> does and the one Richard wants to add.  For example we "lower"
>> posix_memalign as early as GIMPLE lowering (that's before CFG construction).
>> We also lower sincos to cexpi during GENERIC folding (or if that is dropped
>> either GIMPLE lowering or GIMPLE folding during gimplification would be
>> appropriate).
>>
>> Now, with offloading we have to avoid creating target dependencies before
>> LTO stream-out (thus no IFN replacements before that - not sure if
>> Richards patches have an issue there already).
>
> No, this patch was the earliest point at which we converted to internal
> functions.  The idea was to make code treat ECF_PURE built-in functions
> and internal functions as being basically equivalent.  There's therefore
> not much benefit to doing a straight replacement of one with the other
> during either GENERIC or gimple.  Instead the series only used internal
> functions for things that built-in functions couldn't do, specifically:
>
> - the case used in this patch, to optimise part of a non-pure built-in
>   function using a pure equivalent.
>
> - vector versions of built-in functions.
>
> The cfgexpand patch makes sure that pure built-in functions are expanded
> like internal functions where possible.
>
>> Which would leave us with a lowering stage early in the main
>> optimization pipeline - I think fold_builtins pass is way too late but
>> any "folding" pass will do (like forwprop or backprop where the latter
>> might be better because it might end up computing FP "ranges" to
>> improve the initial lowering code).
>
> This isn't at all related to what backprop is doing though.
> backprop is about optimising definitions based on information
> about all uses.
>
> Does fold_builtins need to be as late as it is?


Not sure.  It folds remaining __builtin_constant_p stuff for example.

>> Of course call_cdce is as good as long as it still exists.
>
> Does this meann that you're not against the patch in principle
> (i.e. keeping call_cdce for now and extending it in the way that
> this patch does)?

Yes, I'm fine with extending call_cdce.  Of course I'd happily
approve a patch dissolving it into somewhere where it makes more
sense.  But this shouldn't block this patch.

Richard.

>
> Thanks,
> Richard
>

Re: Extend tree-call-cdce to calls whose result is used

Reply via email to