On 15 September 2017 18:50:26 CEST, Jeff Law <l...@redhat.com> wrote: >On 09/13/2017 03:20 PM, Wilco Dijkstra wrote: >> Jeff Law wrote: >>> On 09/06/2017 03:55 AM, Jackson Woodruff wrote: >>>> On 08/30/2017 01:46 PM, Richard Biener wrote: >> >>>>> rdivtmp = 1 / (y*C); >>>>> tem = x *rdivtmp; >>>>> tem2= z * rdivtmp; >>>>> >>>>> instead of >>>>> >>>>> rdivtmp = 1/y; >>>>> tem = x * 1/C * rdivtmp; >>>>> tem2 = z * 1/C * rdivtmp; >>>> >>>> Ideally we would be able to CSE that into >>>> >>>> rdivtmp = 1/y * 1/C; >>>> tem = x * rdivtmp; >>>> tem2 = z * rdivtmp; >>> So why is your sequence significantly better than Richi's desired >>> seqeuence? They both seem to need 3 mults and a division (which in >both >>> cases might be a reciprocal estimation). In Richi's sequence we >have >>> to mult and feed the result as an operand into the reciprocal insn. >In >>> yours we feed the result of the reciprocal into the multiply. >> >> Basically this stuff happens a lot in real code, which is exactly why >I proposed it. >> I even provided counts of how many divisions each transformation >avoids: >> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71026. >I don't doubt that it happens in real code. There's a reason why MIPS >IV added recip and rsqrt 20 years ago. Our ability to exploit them has >always been fairly limited though. > >What I'm trying to avoid is a transformation where the two forms are >roughly equal in terms of what they expose vs what they hide. > >If pulling the 1/c out is consistently better, then that's obviously >good. If it's essentially a toss-up because of the other interactions >with CSE, then we need to think about it more deeply. > >I _suspect_ that pulling the 1/c out is generally better, but something >more concrete than my intuition would be helpful. > > > > >> >> Note this transformation is a canonicalization - if you can't merge a >> constant somehow, moving it out to the LHS can expose more >> opportunities, like in (C1 * x) / (C2 * y) -> (C1 * x * C2) / y -> >(C3 * x) / y. >> Same for negation as it behaves like * -1. >FWIW, I generally agree. > >> >> The key here is that it is at least an order of magnitude worse if >you have >> to execute an extra division than if you have an extra multiply. >No doubt. I'll trade a division for a multiply any day. Similarly 1/C >is just a constant, so I consider it essentially free. > > >> >>> ISTM in the end if y*C or 1/(y*C) is a CSE, then Richi's sequence >wins. >>> Similarly if 1/y is a CSE, then yours wins. Is there some reason >to >>> believe that one is a more likely CSE than the other? >> >> The idea is that 1/y is much more likely a CSE than 1/(y*C). >Do we have anything other than intuition to back that up? >> >> We could make the pattern only fire in single use cases and see >whether >> that makes a difference. It would be easy to test old vs new vs >single-use >> new and count how many divisions we end up with. We could add 1/ (y * >C) >> to the reciprocal phase if it is unacceptable as a canonicalization, >but then >> you won't be able to optimize (C1 * x * C2) / y. >We could. I think the question would then become does restricting to >the single-use case kill too many opportunities. > >Sigh. I think the 4 of us could go round and round on this forever in >the pursuit of the perfect code. Though in reality I'm happy with a >clear improvement.
Btw reminds me a little bit of https://gcc.gnu.org/bugzilla/show_bug.cgi?id=28417 which IIRC never was applied. Maybe someone could talk Denys (a colleague of yours nowadays, Jeff) into submitting a real patch? ;) Cheers, > >> >>> I think there's a fundamental phase ordering problem here. You want >to >>> CSE stuff as much as possible, then expose reciprocals, then CSE >again >>> because exposing reciprocals can expose new CSE opportunities. >> >> I agree there are phase ordering issues and various problems in >> reassociation, CSE and division optimizations not being able to find >and >> optimize complex cases that are worthwhile. >> >> However I don't agree doing CSE before reciprocals is a good idea. We >> want to expose reciprocals early on, even if that means we find fewer >> CSEs as a result - again because division is so much more expensive >than >> any other operation. CSE is generally not smart enough to CSE a * x >in >> a * b * x and a * c * x, something which is likely to happen quite >frequently - >> unlike the made up division examples here. >We have much stronger reassociation capabilities for multiplication -- >it's a well understood problem and if we fail to pick up a * x across >those two kind of expressions, I'd consider our reassociation code as >failing pretty badly, particularly for integer types. > >BUt yes, division is expensive. And I'm all for a tranformation that >turns a division into a multiplication. That's almost always a win. > >> >> The first question is whether you see it as a canonicalization. If >so, then >> match.pd should be fine. >Pulling the constant part out of the denominator and turning it into a >multiplication by the recip constant should likely be considered >canonicalization. I think Richi largely agreed to this in the thread >as >well and asked for that hunk of the patch to be extracted and submitted >independent of the other changes so that it could go ahead and move >forward while we figure out the rest. > >Note that tree-ssa-reassoc.c has a fair amount of this kind of >canonicalization. In fact, I think that's where we handle things like >pulling out negations. You may actually do better handling it there. > > > >Jeff