While trying to come up with a minimal example I discovered one more puzzling thing. runghc is fastest, ghc is slower, ghc with optimization is slowest. This is completely reverse of the expected order.
ghc -O1 (-O2 is similar): time 15.23 ms (14.72 ms .. 15.73 ms) ghc -O0: time 3.612 ms (3.548 ms .. 3.728 ms) runghc: time 2.250 ms (2.156 ms .. 2.348 ms) I am grokking it further. Any pointers will be helpful. I understand that -O2 can sometimes be slower e.g. aggressive inlining can sometimes be counterproductive. But 4x variation is a lot and this is the case with -O1 as well which should be relatively safer than -O2 in general. Worst of all runghc is significantly faster than ghc. What's going on? -harendra On 8 September 2017 at 18:49, Harendra Kumar <[email protected]> wrote: > I will try creating a minimal example and open a ticket for the inlining > problem, the one I am sure about. > > -harendra > > On 8 September 2017 at 18:35, Simon Peyton Jones <[email protected]> > wrote: > >> *I know that this is not an easy request*, but can either of you produce >> a small example that demonstrates your problem? If so, please open a >> ticket. >> >> >> >> I don’t like hearing about people having to use trial and error with >> INLINE or SPECIALISE pragmas. But I can’t even begin to solve the problem >> unless I can reproduce it. >> >> >> >> Simon >> >> >> >> *From:* ghc-devs [mailto:[email protected]] *On Behalf Of >> *Harendra >> Kumar >> *Sent:* 08 September 2017 13:50 >> *To:* Mikolaj Konarski <[email protected]> >> *Cc:* [email protected] >> *Subject:* Re: Performance degradation when factoring out common code >> >> >> >> I should also point out that I saw performance improvements by manually >> factoring out and propagating some common expressions to outer loops in >> performance sensitive paths. Now I have made this a habit to do this >> manually. Not sure if something like this has also been fixed with that >> ticket or some other ticket. >> >> >> >> -harendra >> >> >> >> On 8 September 2017 at 17:34, Harendra Kumar <[email protected]> >> wrote: >> >> Thanks Mikolaj! I have seen some surprising behavior quite a few times >> recently and I was wondering whether GHC should do better. In one case I >> had to use SPECIALIZE very aggressively, in another version of the same >> code it worked well without that. I have been doing a lot of trial and >> error with the INLINE/NOINLINE pragmas to figure out what the right >> combination is. Sometimes it just feels like black magic, because I cannot >> find a rationale to explain the behavior. I am not sure if there are any >> more such problems lurking in, perhaps this is an area where some >> improvement looks possible. >> >> >> >> -harendra >> >> >> >> >> >> On 8 September 2017 at 17:10, Mikolaj Konarski < >> [email protected]> wrote: >> >> Hello, >> >> I've had a similar problem that's been fixed in 8.2.1: >> >> https://ghc.haskell.org/trac/ghc/ticket/12603 >> >> You can also use some extreme global flags, such as >> >> ghc-options: -fexpose-all-unfoldings -fspecialise-aggressively >> >> to get most the GHC subtlety and shyness out of the way >> when experimenting. >> >> Good luck >> Mikolaj >> >> >> >> >> On Fri, Sep 8, 2017 at 11:21 AM, Harendra Kumar >> <[email protected]> wrote: >> > Hi, >> > >> > I have this code snippet for the bind implementation of a Monad: >> > >> > AsyncT m >>= f = AsyncT $ \_ stp yld -> >> > let run x = (runAsyncT x) Nothing stp yld >> > yield a _ Nothing = run $ f a >> > yield a _ (Just r) = run $ f a <> (r >>= f) >> > in m Nothing stp yield >> > >> > I want to have multiple versions of this implementation parameterized >> by a >> > function, like this: >> > >> > bindWith k (AsyncT m) f = AsyncT $ \_ stp yld -> >> > let run x = (runAsyncT x) Nothing stp yld >> > yield a _ Nothing = run $ f a >> > yield a _ (Just r) = run $ f a `k` (bindWith k r f) >> > in m Nothing stp yield >> > >> > And then the bind function becomes: >> > >> > (>>=) = bindWith (<>) >> > >> > But this leads to a performance degradation of more than 10%. inlining >> does >> > not help, I tried INLINE pragma as well as the "inline" GHC builtin. I >> > thought this should be a more or less straightforward replacement >> making the >> > second version equivalent to the first one. But apparently there is >> > something going on here that makes it perform worse. >> > >> > I did not look at the core, stg or asm yet. Hoping someone can quickly >> > comment on it. Any ideas why is it so? Can this be worked around >> somehow? >> > >> > Thanks, >> > Harendra >> > >> >> > _______________________________________________ >> > ghc-devs mailing list >> > [email protected] >> > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs >> <https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmail.haskell.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fghc-devs&data=02%7C01%7Csimonpj%40microsoft.com%7C5ff3c69fb9d447c47b5908d4f6b832de%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636404718373134824&sdata=zyHYozym6TzL61Tq5CSERjqhKlxr%2ByV0j%2FyHtxmXmVE%3D&reserved=0> >> > >> >> >> >> >> > >
_______________________________________________ ghc-devs mailing list [email protected] http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
