#3553: parallel gc suffers badly if one thread is descheduled
---------------------------------+------------------------------------------
    Reporter:  simonmar          |        Owner:  igloo       
        Type:  merge             |       Status:  new         
    Priority:  normal            |    Milestone:  6.12.2      
   Component:  Runtime System    |      Version:  6.10.4      
    Keywords:                    |   Difficulty:  Unknown     
          Os:  Unknown/Multiple  |     Testcase:              
Architecture:  Unknown/Multiple  |      Failure:  None/Unknown
---------------------------------+------------------------------------------

Comment(by simonmar):

 Patch to use futexes attached.  This is significantly slower than the
 `sched_yield` version currently in use.  I don't know why - as far as I
 can tell I'm using futexes correctly.  The protocol I'm using is from
 Drepper's paper, and I tried it with and without some user-space spinning
 in the acquire case.

 nofib/parallel/ray on 8 cores, first with futexes and then with yield:

 {{{
 $ ./ray 1000 +RTS -N8 -s >/dev/null
   14,784,695,584 bytes allocated in the heap
      246,403,264 bytes copied during GC
          108,232 bytes maximum residency (169 sample(s))
          310,000 bytes maximum slop
                6 MB total memory in use (0 MB lost due to fragmentation)

   Generation 0:  4606 collections,  4605 parallel,  4.31s,  2.16s elapsed
   Generation 1:   169 collections,   169 parallel,  0.22s,  0.08s elapsed

   Parallel GC work balance: 1.56 (30214391 / 19430130, ideal 8)

   SPARKS: 1000000 (978174 converted, 21636 pruned)

   INIT  time    0.00s  (  0.00s elapsed)
   MUT   time    8.96s  (  1.86s elapsed)
   GC    time    4.53s  (  2.24s elapsed)
   EXIT  time    0.00s  (  0.00s elapsed)
   Total time   13.49s  (  4.11s elapsed)


 $ ./ray-yield 1000 +RTS -N8 -s >/dev/null
   14,834,802,304 bytes allocated in the heap
      237,105,080 bytes copied during GC
           97,736 bytes maximum residency (158 sample(s))
          299,160 bytes maximum slop
                6 MB total memory in use (0 MB lost due to fragmentation)

   Generation 0:  4515 collections,  4514 parallel,  7.73s,  1.65s elapsed
   Generation 1:   158 collections,   158 parallel,  0.39s,  0.06s elapsed

   Parallel GC work balance: 1.53 (29092959 / 18954020, ideal 8)

   SPARKS: 1000000 (980491 converted, 19356 pruned)

   INIT  time    0.00s  (  0.00s elapsed)
   MUT   time   10.74s  (  1.93s elapsed)
   GC    time    8.11s  (  1.71s elapsed)
   EXIT  time    0.00s  (  0.00s elapsed)
   Total time   18.86s  (  3.64s elapsed)
 }}}

 The extra CPU time in the yield version is due to the higher spin
 threshold, but I've tried different spin thresholds in the futex case and
 it didn't help.

-- 
Ticket URL: <http://hackage.haskell.org/trac/ghc/ticket/3553#comment:5>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
_______________________________________________
Glasgow-haskell-bugs mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-bugs

Reply via email to