Do you have the code?
On Sat, Sep 9, 2017 at 6:05 AM, Harendra Kumar <[email protected]> wrote: > While trying to come up with a minimal example I discovered one more > puzzling thing. runghc is fastest, ghc is slower, ghc with optimization is > slowest. This is completely reverse of the expected order. > > ghc -O1 (-O2 is similar): > > time 15.23 ms (14.72 ms .. 15.73 ms) > > ghc -O0: > > time 3.612 ms (3.548 ms .. 3.728 ms) > > runghc: > > time 2.250 ms (2.156 ms .. 2.348 ms) > > > I am grokking it further. Any pointers will be helpful. I understand that > -O2 can sometimes be slower e.g. aggressive inlining can sometimes be > counterproductive. But 4x variation is a lot and this is the case with -O1 > as well which should be relatively safer than -O2 in general. Worst of all > runghc is significantly faster than ghc. What's going on? > > -harendra > > > On 8 September 2017 at 18:49, Harendra Kumar <[email protected]> > wrote: >> >> I will try creating a minimal example and open a ticket for the inlining >> problem, the one I am sure about. >> >> -harendra >> >> On 8 September 2017 at 18:35, Simon Peyton Jones <[email protected]> >> wrote: >>> >>> I know that this is not an easy request, but can either of you produce a >>> small example that demonstrates your problem? If so, please open a ticket. >>> >>> >>> >>> I don’t like hearing about people having to use trial and error with >>> INLINE or SPECIALISE pragmas. But I can’t even begin to solve the problem >>> unless I can reproduce it. >>> >>> >>> >>> Simon >>> >>> >>> >>> From: ghc-devs [mailto:[email protected]] On Behalf Of >>> Harendra Kumar >>> Sent: 08 September 2017 13:50 >>> To: Mikolaj Konarski <[email protected]> >>> Cc: [email protected] >>> Subject: Re: Performance degradation when factoring out common code >>> >>> >>> >>> I should also point out that I saw performance improvements by manually >>> factoring out and propagating some common expressions to outer loops in >>> performance sensitive paths. Now I have made this a habit to do this >>> manually. Not sure if something like this has also been fixed with that >>> ticket or some other ticket. >>> >>> >>> >>> -harendra >>> >>> >>> >>> On 8 September 2017 at 17:34, Harendra Kumar <[email protected]> >>> wrote: >>> >>> Thanks Mikolaj! I have seen some surprising behavior quite a few times >>> recently and I was wondering whether GHC should do better. In one case I had >>> to use SPECIALIZE very aggressively, in another version of the same code it >>> worked well without that. I have been doing a lot of trial and error with >>> the INLINE/NOINLINE pragmas to figure out what the right combination is. >>> Sometimes it just feels like black magic, because I cannot find a rationale >>> to explain the behavior. I am not sure if there are any more such problems >>> lurking in, perhaps this is an area where some improvement looks possible. >>> >>> >>> >>> -harendra >>> >>> >>> >>> >>> >>> On 8 September 2017 at 17:10, Mikolaj Konarski >>> <[email protected]> wrote: >>> >>> Hello, >>> >>> I've had a similar problem that's been fixed in 8.2.1: >>> >>> https://ghc.haskell.org/trac/ghc/ticket/12603 >>> >>> You can also use some extreme global flags, such as >>> >>> ghc-options: -fexpose-all-unfoldings -fspecialise-aggressively >>> >>> to get most the GHC subtlety and shyness out of the way >>> when experimenting. >>> >>> Good luck >>> Mikolaj >>> >>> >>> >>> >>> On Fri, Sep 8, 2017 at 11:21 AM, Harendra Kumar >>> <[email protected]> wrote: >>> > Hi, >>> > >>> > I have this code snippet for the bind implementation of a Monad: >>> > >>> > AsyncT m >>= f = AsyncT $ \_ stp yld -> >>> > let run x = (runAsyncT x) Nothing stp yld >>> > yield a _ Nothing = run $ f a >>> > yield a _ (Just r) = run $ f a <> (r >>= f) >>> > in m Nothing stp yield >>> > >>> > I want to have multiple versions of this implementation parameterized >>> > by a >>> > function, like this: >>> > >>> > bindWith k (AsyncT m) f = AsyncT $ \_ stp yld -> >>> > let run x = (runAsyncT x) Nothing stp yld >>> > yield a _ Nothing = run $ f a >>> > yield a _ (Just r) = run $ f a `k` (bindWith k r f) >>> > in m Nothing stp yield >>> > >>> > And then the bind function becomes: >>> > >>> > (>>=) = bindWith (<>) >>> > >>> > But this leads to a performance degradation of more than 10%. inlining >>> > does >>> > not help, I tried INLINE pragma as well as the "inline" GHC builtin. I >>> > thought this should be a more or less straightforward replacement >>> > making the >>> > second version equivalent to the first one. But apparently there is >>> > something going on here that makes it perform worse. >>> > >>> > I did not look at the core, stg or asm yet. Hoping someone can quickly >>> > comment on it. Any ideas why is it so? Can this be worked around >>> > somehow? >>> > >>> > Thanks, >>> > Harendra >>> > >>> >>> > _______________________________________________ >>> > ghc-devs mailing list >>> > [email protected] >>> > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs >>> > >>> >>> >>> >>> >> >> > > > _______________________________________________ > ghc-devs mailing list > [email protected] > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs > _______________________________________________ ghc-devs mailing list [email protected] http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
