#1889: Regression in concurrency performance from ghc 6.6 to 6.8
-----------------------------------------------+----------------------------
 Reporter:  dons                               |          Owner:  simonmar
     Type:  bug                                |         Status:  new     
 Priority:  normal                             |      Milestone:  6.8.3   
Component:  Runtime System                     |        Version:  6.8.1   
 Severity:  normal                             |     Resolution:          
 Keywords:  threads, concurrency, performance  |     Difficulty:  Unknown 
 Testcase:                                     |   Architecture:  Multiple
       Os:  Multiple                           |  
-----------------------------------------------+----------------------------
Comment (by simonmar):

 I don't see the differences reported.  `threadring` runs at exactly the
 same speed with 6.6.1 and 6.8.1 here, and `chaneneos` is slightly faster
 with 6.8.1.  So we have to look at how your GHC was built: for reference
 the builds I'm using are

 {{{
 BeConservative = YES
 XMLDocWays=html
 PublishCp=rsync
 [EMAIL PROTECTED]:/home/haskell/ghc/dist/stable
 GhcStage2HcOpts=-DDEBUG -debug
 GhcLibHcOpts=-O2 -fasm -dcore-lint -fgenerics
 HADDOCK_DOCS=YES
 }}}

 The only thing that should make a difference in performance relative to
 the default build is the `GhcLibHcOpts` line.

 These are the build settings used by the nightly builds, and the same
 settings are used to build the binary distributions we ship from
 haskell.org.

 Can someone who is seeing a performance difference give more details:
 OS/architecture, GHC build settings (or where you got your binaries from),
 gcc version.  I'll see if I can reproduce it from that.

 In reply to jedbrown:  here are the results I get

 {{{
 > for e in ./ghc-66-O ./ghc-66-O2 ./ghc-68-O ./ghc-68-O2 ; do time $e
 7000000 >/dev/null; done
 7.97s real   7.96s user   0.01s system   99% $e 7000000 > /dev/null
 7.47s real   7.44s user   0.01s system   99% $e 7000000 > /dev/null
 6.93s real   6.92s user   0.01s system   100% $e 7000000 > /dev/null
 6.89s real   6.85s user   0.02s system   99% $e 7000000 > /dev/null
 }}}

 This is on x86_64/Linux with gcc 4.1.0.

 In reply to j.waldmann: the first result is known, adding `-threaded`
 turns on atomic locking for `MVar` operations, see #693.  The atomic
 operations aren't necessary with `-N1`, so that ticket suggests adding
 some conditionals to speed things up in that case.

 The second result, namely that adding `-N2` slows things down even more,
 is because this test is hard to parallelise.  Unless the scheduler manages
 to schedule exactly half the ring on each CPU, performance goes down the
 drain due to the communication overhead.  You can get a modest speedup
 using `GHC.Conc.forkOnIO` to fix the threads to CPUs.  Here is the version
 of the benchmark we have in `nofib/smp/threads004` for testing the
 scheduler:

 {{{
 import Control.Concurrent
 import Control.Monad
 import System
 import GHC.Conc (forkOnIO)

 thread :: MVar Int -> MVar Int -> IO ()
 thread inp out = do x <- takeMVar inp; putMVar out $! x+1; thread inp out

 spawn cur n = do next <- newEmptyMVar
                  forkOnIO (if (n <= 1000) then 0 else 1) $ thread cur next
                  return next

 main = do n <- getArgs >>= readIO.head
           s <- newEmptyMVar
           e <- foldM spawn s [1..2000]
           f <- newEmptyMVar
           forkOnIO 1 $ replicateM n (takeMVar e) >>= putMVar f . sum
           replicateM n (putMVar s 0)
           takeMVar f
 }}}

-- 
Ticket URL: <http://hackage.haskell.org/trac/ghc/ticket/1889#comment:4>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
_______________________________________________
Glasgow-haskell-bugs mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-bugs

Reply via email to