Re: [PATCH] Factor out division by squares and remove division around comparisons (1/2)

Bernhard Reutner-Fischer Sat, 16 Sep 2017 14:39:36 -0700

On 15 September 2017 18:50:26 CEST, Jeff Law <l...@redhat.com> wrote:
>On 09/13/2017 03:20 PM, Wilco Dijkstra wrote:
>> Jeff Law wrote:
>>> On 09/06/2017 03:55 AM, Jackson Woodruff wrote:
>>>> On 08/30/2017 01:46 PM, Richard Biener wrote:
>> 
>>>>>    rdivtmp = 1 / (y*C);
>>>>>    tem = x *rdivtmp;
>>>>>    tem2= z * rdivtmp;
>>>>>
>>>>> instead of
>>>>>
>>>>>    rdivtmp = 1/y;
>>>>>    tem = x * 1/C * rdivtmp;
>>>>>    tem2 = z * 1/C * rdivtmp;
>>>>
>>>> Ideally we would be able to CSE that into
>>>>
>>>> rdivtmp = 1/y * 1/C;
>>>> tem = x * rdivtmp;
>>>> tem2 = z * rdivtmp;
>>> So why is your sequence significantly better than Richi's desired
>>> seqeuence?  They both seem to need 3 mults and a division (which in
>both
>>> cases might be a reciprocal estimation).    In Richi's sequence we
>have
>>> to mult and feed the result as an operand into the reciprocal insn. 
>In
>>> yours we feed the result of the reciprocal into the multiply.
>> 
>> Basically this stuff happens a lot in real code, which is exactly why
>I proposed it.
>> I even provided counts of how many divisions each transformation
>avoids:
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71026. 
>I don't doubt that it happens in real code.  There's a reason why MIPS
>IV added recip and rsqrt 20 years ago.  Our ability to exploit them has
>always been fairly limited though.
>
>What I'm trying to avoid is a transformation where the two forms are
>roughly equal in terms of what they expose vs what they hide.
>
>If pulling the 1/c out is consistently better, then that's obviously
>good.  If it's essentially a toss-up because of the other interactions
>with CSE, then we need to think about it more deeply.
>
>I _suspect_ that pulling the 1/c out is generally better, but something
>more concrete than my intuition would be helpful.
>
>
>
>
>> 
>> Note this transformation is a canonicalization - if you can't merge a
>> constant somehow, moving it out to the LHS can expose more
>> opportunities, like in (C1 * x) / (C2 * y) -> (C1 * x * C2) / y ->
>(C3 * x) / y.
>> Same for negation as it behaves like * -1.
>FWIW, I generally agree.
>
>> 
>> The key here is that it is at least an order of magnitude worse if
>you have
>> to execute an extra division than if you have an extra multiply.
>No doubt.  I'll trade a division for a multiply any day.  Similarly 1/C
>is just a constant, so I consider it essentially free.
>
>
>> 
>>> ISTM in the end if  y*C or 1/(y*C) is a CSE, then Richi's sequence
>wins.
>>>  Similarly if 1/y is a CSE, then yours wins.  Is there some reason
>to
>>> believe that one is a more likely CSE than the other?
>> 
>> The idea is that 1/y is much more likely a CSE than 1/(y*C).
>Do we have anything other than intuition to back that up?
>> 
>> We could make the pattern only fire in single use cases and see
>whether
>> that makes a difference. It would be easy to test old vs new vs
>single-use
>> new and count how many divisions we end up with. We could add 1/ (y *
>C)
>> to the reciprocal phase if it is unacceptable as a canonicalization,
>but then
>> you won't be able to optimize (C1 * x * C2) / y.
>We could.  I think the question would then become does restricting to
>the single-use case kill too many opportunities.
>
>Sigh.  I think the 4 of us could go round and round on this forever in
>the pursuit of the perfect code.  Though in reality I'm happy with a
>clear improvement.



Btw reminds me a little bit of 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=28417
which IIRC never was applied. Maybe someone could talk Denys (a colleague of 
yours nowadays, Jeff) into submitting a real patch? ;)

Cheers,
>
>> 
>>> I think there's a fundamental phase ordering problem here.  You want
>to
>>> CSE stuff as much as possible, then expose reciprocals, then CSE
>again
>>> because exposing reciprocals can expose new CSE opportunities.
>> 
>> I agree there are phase ordering issues and various problems in
>> reassociation, CSE and division optimizations not being able to find
>and
>> optimize complex cases that are worthwhile.
>> 
>> However I don't agree doing CSE before reciprocals is a good idea. We
>> want to expose reciprocals early on, even if that means we find fewer
>> CSEs as a result - again because division is so much more expensive
>than
>> any other operation. CSE is generally not smart enough to CSE a * x
>in
>> a * b * x and a * c * x, something which is likely to happen quite
>frequently -
>> unlike the made up division examples here.
>We have much stronger reassociation capabilities for multiplication --
>it's a well understood problem and if we fail to pick up a * x across
>those two kind of expressions, I'd consider our reassociation code as
>failing pretty badly, particularly for integer types.
>
>BUt yes, division is expensive.  And I'm all for a tranformation that
>turns a division into a multiplication.  That's almost always a win.
>
>> 
>> The first question is whether you see it as a canonicalization. If
>so, then
>> match.pd should be fine.
>Pulling the constant part out of the denominator and turning it into a
>multiplication by the recip constant should likely be considered
>canonicalization.  I think Richi largely agreed to this in the thread
>as
>well and asked for that hunk of the patch to be extracted and submitted
>independent of the other changes so that it could go ahead and move
>forward while we figure out the rest.
>
>Note that tree-ssa-reassoc.c has a fair amount of this kind of
>canonicalization.  In fact, I think that's where we handle things like
>pulling out negations.  You may actually do better handling it there.
>
>
>
>Jeff

Re: [PATCH] Factor out division by squares and remove division around comparisons (1/2)

Reply via email to