Jeff Law wrote:
> On 09/06/2017 03:55 AM, Jackson Woodruff wrote:
> > On 08/30/2017 01:46 PM, Richard Biener wrote:

>>>   rdivtmp = 1 / (y*C);
>>>   tem = x *rdivtmp;
>>>   tem2= z * rdivtmp;
>>>
>>> instead of
>>>
>>>   rdivtmp = 1/y;
>>>   tem = x * 1/C * rdivtmp;
>>>   tem2 = z * 1/C * rdivtmp;
>> 
>> Ideally we would be able to CSE that into
>> 
>> rdivtmp = 1/y * 1/C;
>> tem = x * rdivtmp;
>> tem2 = z * rdivtmp;
> So why is your sequence significantly better than Richi's desired
> seqeuence?  They both seem to need 3 mults and a division (which in both
> cases might be a reciprocal estimation).    In Richi's sequence we have
> to mult and feed the result as an operand into the reciprocal insn.  In
> yours we feed the result of the reciprocal into the multiply.

Basically this stuff happens a lot in real code, which is exactly why I 
proposed it.
I even provided counts of how many divisions each transformation avoids:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71026. 

Note this transformation is a canonicalization - if you can't merge a
constant somehow, moving it out to the LHS can expose more
opportunities, like in (C1 * x) / (C2 * y) -> (C1 * x * C2) / y -> (C3 * x) / y.
Same for negation as it behaves like * -1.

The key here is that it is at least an order of magnitude worse if you have
to execute an extra division than if you have an extra multiply.

> ISTM in the end if  y*C or 1/(y*C) is a CSE, then Richi's sequence wins.
> Similarly if 1/y is a CSE, then yours wins.  Is there some reason to
> believe that one is a more likely CSE than the other?

The idea is that 1/y is much more likely a CSE than 1/(y*C).

We could make the pattern only fire in single use cases and see whether
that makes a difference. It would be easy to test old vs new vs single-use
new and count how many divisions we end up with. We could add 1/ (y * C)
to the reciprocal phase if it is unacceptable as a canonicalization, but then
you won't be able to optimize (C1 * x * C2) / y.

> I think there's a fundamental phase ordering problem here.  You want to
> CSE stuff as much as possible, then expose reciprocals, then CSE again
> because exposing reciprocals can expose new CSE opportunities.

I agree there are phase ordering issues and various problems in
reassociation, CSE and division optimizations not being able to find and
optimize complex cases that are worthwhile.

However I don't agree doing CSE before reciprocals is a good idea. We
want to expose reciprocals early on, even if that means we find fewer
CSEs as a result - again because division is so much more expensive than
any other operation. CSE is generally not smart enough to CSE a * x in
a * b * x and a * c * x, something which is likely to happen quite frequently -
unlike the made up division examples here.

> And I suspect that no matter how hard we try, there's going to be cases
> that get exposed by various transformations in the pipeline such that to
> fully optimize the cse - reciprocal - cse sequence would need to be
> repeated to fully optimize.  We may have to live with not being
> particularly good at picking up the those second order effects.
> 
> I do think that the need for cse - reciprocal - cse  sequencing suggests
> that match.pd may not be the right place for these transformations.  I
> think Richi has pointed this out a couple times already.

I don't think you can ever expect to find the optimal case by repeating
optimizations. It's quite easy to construct examples where first doing CSE
makes things significantly worse. Ultimately to get something optimal you'd
need to try lots of permutations and count for each possible permutation
how many multiplies and divisons you end up after full optimization.
Quite impractical...

> I'm not going to accept or reject at this time.  I think we need to make
> a higher level decision.  Are these transformations better suited for
> match.pd or the reciprocal transformation pass?  I realize that some
> patterns are already in match.pd, but let's try to settle the higher
> level issue before we add more.

The first question is whether you see it as a canonicalization. If so, then
match.pd should be fine.

Wilco

Reply via email to