I could pinpoint one part of the problem. Please see the ticket: https://ghc.haskell.org/trac/ghc/ticket/14208. Here is the description that I wrote in the ticket:
In this particular case -O2 is 2x slower than -O0 and -O0 is 2x slower than runghc. Please see the github repo: https://github.com/harendra-kumar/ghc-perf to reproduce the issue. Readme file in the repo has instructions to reproduce. The issue seems to occur when the code is placed in a different module. When all the code is in the same module the problem does not occur. In that case -O2 is faster than -O0. However, when the code is split into two modules the performance gets inverted. Also, it does not occur always, when I tried to change the code to make it simpler for repro the problem did not occur. -harendra On 9 September 2017 at 14:08, Harendra Kumar <[email protected]> wrote: > The code is at: https://github.com/harendra-kumar/asyncly. The benchmark > code is in "benchmark/Main.hs". The relevant function is "asyncly_basic". > > If you want to run it, you can use the following steps to reproduce the > behavior I reported below: > > 1) Run "stack build" > 2) Run "stack runghc benchmark/Main.hs" for runghc figures > 3) Run "stack ghc benchmark/Main.hs && benchmark/Main" to compile and run > normally > 4) Run "stack ghc -- -O2 benchmark/Main.hs && benchmark/Main" to compile > and run with -O2 flag > > Just look at the first benchmark (asyncly-serial), you can comment out all > others if you want to. Note that the library gets compiled without any > optimization flags (see the ghc options in the cabal file). So what we are > seeing here is just the effect of -O2 on compiling benchmarks/Main.hs. > > I am also trying to isolate the problem to a minimal case. I tried > removing all the INLINE pragmas in the library to make sure that I am not > screwing it up by asking the compiler to inline aggressively, but that does > not seem to make any difference to the situation. Let me know if you need > any information from me or help in running it. > > There are three issues that I am trying to get answers for: > > 1) Why runghc is faster? It means that there is a possibility for the > program to run as fast as runghc runs it. How do I get that performance or > an explanation of it? > > 2) Why -O1/O2 degrades performance so much by 4-5x. > > 3) The third one is the original problem that I posted in this thread, > compiler is unable to match manual inlining. It is possible that this is an > issue only when -O1/O2 is used and not when -O0 is used. > > Thanks for the help. > > -harendra > > > On 9 September 2017 at 13:30, Matthew Pickering < > [email protected]> wrote: > >> Do you have the code? >> >> On Sat, Sep 9, 2017 at 6:05 AM, Harendra Kumar <[email protected]> >> wrote: >> > While trying to come up with a minimal example I discovered one more >> > puzzling thing. runghc is fastest, ghc is slower, ghc with optimization >> is >> > slowest. This is completely reverse of the expected order. >> > >> > ghc -O1 (-O2 is similar): >> > >> > time 15.23 ms (14.72 ms .. 15.73 ms) >> > >> > ghc -O0: >> > >> > time 3.612 ms (3.548 ms .. 3.728 ms) >> > >> > runghc: >> > >> > time 2.250 ms (2.156 ms .. 2.348 ms) >> > >> > >> > I am grokking it further. Any pointers will be helpful. I understand >> that >> > -O2 can sometimes be slower e.g. aggressive inlining can sometimes be >> > counterproductive. But 4x variation is a lot and this is the case with >> -O1 >> > as well which should be relatively safer than -O2 in general. Worst of >> all >> > runghc is significantly faster than ghc. What's going on? >> > >> > -harendra >> > >> > >> > On 8 September 2017 at 18:49, Harendra Kumar <[email protected]> >> > wrote: >> >> >> >> I will try creating a minimal example and open a ticket for the >> inlining >> >> problem, the one I am sure about. >> >> >> >> -harendra >> >> >> >> On 8 September 2017 at 18:35, Simon Peyton Jones < >> [email protected]> >> >> wrote: >> >>> >> >>> I know that this is not an easy request, but can either of you >> produce a >> >>> small example that demonstrates your problem? If so, please open a >> ticket. >> >>> >> >>> >> >>> >> >>> I don’t like hearing about people having to use trial and error with >> >>> INLINE or SPECIALISE pragmas. But I can’t even begin to solve the >> problem >> >>> unless I can reproduce it. >> >>> >> >>> >> >>> >> >>> Simon >> >>> >> >>> >> >>> >> >>> From: ghc-devs [mailto:[email protected]] On Behalf Of >> >>> Harendra Kumar >> >>> Sent: 08 September 2017 13:50 >> >>> To: Mikolaj Konarski <[email protected]> >> >>> Cc: [email protected] >> >>> Subject: Re: Performance degradation when factoring out common code >> >>> >> >>> >> >>> >> >>> I should also point out that I saw performance improvements by >> manually >> >>> factoring out and propagating some common expressions to outer loops >> in >> >>> performance sensitive paths. Now I have made this a habit to do this >> >>> manually. Not sure if something like this has also been fixed with >> that >> >>> ticket or some other ticket. >> >>> >> >>> >> >>> >> >>> -harendra >> >>> >> >>> >> >>> >> >>> On 8 September 2017 at 17:34, Harendra Kumar < >> [email protected]> >> >>> wrote: >> >>> >> >>> Thanks Mikolaj! I have seen some surprising behavior quite a few times >> >>> recently and I was wondering whether GHC should do better. In one >> case I had >> >>> to use SPECIALIZE very aggressively, in another version of the same >> code it >> >>> worked well without that. I have been doing a lot of trial and error >> with >> >>> the INLINE/NOINLINE pragmas to figure out what the right combination >> is. >> >>> Sometimes it just feels like black magic, because I cannot find a >> rationale >> >>> to explain the behavior. I am not sure if there are any more such >> problems >> >>> lurking in, perhaps this is an area where some improvement looks >> possible. >> >>> >> >>> >> >>> >> >>> -harendra >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> On 8 September 2017 at 17:10, Mikolaj Konarski >> >>> <[email protected]> wrote: >> >>> >> >>> Hello, >> >>> >> >>> I've had a similar problem that's been fixed in 8.2.1: >> >>> >> >>> https://ghc.haskell.org/trac/ghc/ticket/12603 >> >>> >> >>> You can also use some extreme global flags, such as >> >>> >> >>> ghc-options: -fexpose-all-unfoldings -fspecialise-aggressively >> >>> >> >>> to get most the GHC subtlety and shyness out of the way >> >>> when experimenting. >> >>> >> >>> Good luck >> >>> Mikolaj >> >>> >> >>> >> >>> >> >>> >> >>> On Fri, Sep 8, 2017 at 11:21 AM, Harendra Kumar >> >>> <[email protected]> wrote: >> >>> > Hi, >> >>> > >> >>> > I have this code snippet for the bind implementation of a Monad: >> >>> > >> >>> > AsyncT m >>= f = AsyncT $ \_ stp yld -> >> >>> > let run x = (runAsyncT x) Nothing stp yld >> >>> > yield a _ Nothing = run $ f a >> >>> > yield a _ (Just r) = run $ f a <> (r >>= f) >> >>> > in m Nothing stp yield >> >>> > >> >>> > I want to have multiple versions of this implementation >> parameterized >> >>> > by a >> >>> > function, like this: >> >>> > >> >>> > bindWith k (AsyncT m) f = AsyncT $ \_ stp yld -> >> >>> > let run x = (runAsyncT x) Nothing stp yld >> >>> > yield a _ Nothing = run $ f a >> >>> > yield a _ (Just r) = run $ f a `k` (bindWith k r f) >> >>> > in m Nothing stp yield >> >>> > >> >>> > And then the bind function becomes: >> >>> > >> >>> > (>>=) = bindWith (<>) >> >>> > >> >>> > But this leads to a performance degradation of more than 10%. >> inlining >> >>> > does >> >>> > not help, I tried INLINE pragma as well as the "inline" GHC >> builtin. I >> >>> > thought this should be a more or less straightforward replacement >> >>> > making the >> >>> > second version equivalent to the first one. But apparently there is >> >>> > something going on here that makes it perform worse. >> >>> > >> >>> > I did not look at the core, stg or asm yet. Hoping someone can >> quickly >> >>> > comment on it. Any ideas why is it so? Can this be worked around >> >>> > somehow? >> >>> > >> >>> > Thanks, >> >>> > Harendra >> >>> > >> >>> >> >>> > _______________________________________________ >> >>> > ghc-devs mailing list >> >>> > [email protected] >> >>> > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs >> >>> > >> >>> >> >>> >> >>> >> >>> >> >> >> >> >> > >> > >> > _______________________________________________ >> > ghc-devs mailing list >> > [email protected] >> > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs >> > >> > >
_______________________________________________ ghc-devs mailing list [email protected] http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
