#4951: Performance regression 7.0.1 -> 7.0.1.20110201
--------------------------------------+-------------------------------------
Reporter: simonmar | Owner: simonpj
Type: bug | Status: new
Priority: highest | Milestone: 7.0.2
Component: Compiler | Version: 7.0.1
Resolution: | Keywords:
Testcase: | Blockedby:
Difficulty: | Os: Unknown/Multiple
Blocking: | Architecture: Unknown/Multiple
Failure: Runtime performance bug |
--------------------------------------+-------------------------------------
Comment(by simonpj):
I'm very puzzled. I've been looking at `imaginary/primes`, a very simple
benchmark. With GHC 6.12 -O I get
{{{
bash$ ./primes-612 4000 +RTS -s
37831
457,709,320 bytes allocated in the heap
82,966,936 bytes copied during GC
378,968 bytes maximum residency (29 sample(s))
89,944 bytes maximum slop
3 MB total memory in use (0 MB lost due to fragmentation)
Generation 0: 844 collections, 0 parallel, 0.99s, 0.96s elapsed
Generation 1: 29 collections, 0 parallel, 0.05s, 0.06s elapsed
INIT time 0.00s ( 0.00s elapsed)
MUT time 0.67s ( 0.69s elapsed)
GC time 1.04s ( 1.03s elapsed)
EXIT time 0.00s ( 0.00s elapsed)
Total time 1.71s ( 1.72s elapsed)
}}}
With HEAD I get
{{{
bash$ ./primes 4000 +RTS -rprimes.ticky -s
37831
718,383,496 bytes allocated in the heap
79,051,592 bytes copied during GC
357,648 bytes maximum residency (25 sample(s))
88,384 bytes maximum slop
3 MB total memory in use (0 MB lost due to fragmentation)
Generation 0: 848 collections, 0 parallel, 0.95s, 0.95s elapsed
Generation 1: 25 collections, 0 parallel, 0.05s, 0.05s elapsed
INIT time 0.00s ( 0.00s elapsed)
MUT time 1.43s ( 1.43s elapsed)
GC time 1.00s ( 1.00s elapsed)
EXIT time 0.00s ( 0.00s elapsed)
Total time 2.43s ( 2.43s elapsed)
}}}
Note the massive increase in allocation, and in mutator execution time.
BUT when I look at the `-ddump-simpl` code I see virtually the same code.
(For HEAD I also used `-funfolding-use-threshold=9` to make `mod` inline;
the HEAD seems a tiny bit less keen to inline. With this flag the two are
close to identical.)
Moreover, I compiled both with `-ticky` (including all the libraries). The
allocation word counts from `-ticky` are practically the same for the two
programs!
So I'm stumped. Somewhere a lot of time and allocation is happening, but
`-ticky` isn't seeing it.
It's a really small program and (by the time we've done inlining) almost
all the code is in Main (though it still calls `GHC.List.filter`).
So where is that time and allocation going? Somewhere in the RTS?
Simon
--
Ticket URL: <http://hackage.haskell.org/trac/ghc/ticket/4951#comment:8>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
_______________________________________________
Glasgow-haskell-bugs mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-bugs