#3553: parallel gc suffers badly if one thread is descheduled
---------------------------------+------------------------------------------
Reporter: simonmar | Owner: igloo
Type: merge | Status: new
Priority: normal | Milestone: 6.12.2
Component: Runtime System | Version: 6.10.4
Keywords: | Difficulty: Unknown
Os: Unknown/Multiple | Testcase:
Architecture: Unknown/Multiple | Failure: None/Unknown
---------------------------------+------------------------------------------
Comment(by simonmar):
Patch to use futexes attached. This is significantly slower than the
`sched_yield` version currently in use. I don't know why - as far as I
can tell I'm using futexes correctly. The protocol I'm using is from
Drepper's paper, and I tried it with and without some user-space spinning
in the acquire case.
nofib/parallel/ray on 8 cores, first with futexes and then with yield:
{{{
$ ./ray 1000 +RTS -N8 -s >/dev/null
14,784,695,584 bytes allocated in the heap
246,403,264 bytes copied during GC
108,232 bytes maximum residency (169 sample(s))
310,000 bytes maximum slop
6 MB total memory in use (0 MB lost due to fragmentation)
Generation 0: 4606 collections, 4605 parallel, 4.31s, 2.16s elapsed
Generation 1: 169 collections, 169 parallel, 0.22s, 0.08s elapsed
Parallel GC work balance: 1.56 (30214391 / 19430130, ideal 8)
SPARKS: 1000000 (978174 converted, 21636 pruned)
INIT time 0.00s ( 0.00s elapsed)
MUT time 8.96s ( 1.86s elapsed)
GC time 4.53s ( 2.24s elapsed)
EXIT time 0.00s ( 0.00s elapsed)
Total time 13.49s ( 4.11s elapsed)
$ ./ray-yield 1000 +RTS -N8 -s >/dev/null
14,834,802,304 bytes allocated in the heap
237,105,080 bytes copied during GC
97,736 bytes maximum residency (158 sample(s))
299,160 bytes maximum slop
6 MB total memory in use (0 MB lost due to fragmentation)
Generation 0: 4515 collections, 4514 parallel, 7.73s, 1.65s elapsed
Generation 1: 158 collections, 158 parallel, 0.39s, 0.06s elapsed
Parallel GC work balance: 1.53 (29092959 / 18954020, ideal 8)
SPARKS: 1000000 (980491 converted, 19356 pruned)
INIT time 0.00s ( 0.00s elapsed)
MUT time 10.74s ( 1.93s elapsed)
GC time 8.11s ( 1.71s elapsed)
EXIT time 0.00s ( 0.00s elapsed)
Total time 18.86s ( 3.64s elapsed)
}}}
The extra CPU time in the yield version is due to the higher spin
threshold, but I've tried different spin thresholds in the futex case and
it didn't help.
--
Ticket URL: <http://hackage.haskell.org/trac/ghc/ticket/3553#comment:5>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
_______________________________________________
Glasgow-haskell-bugs mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-bugs