#1889: Regression in concurrency performance from ghc 6.6 to 6.8
-----------------------------------------------+----------------------------
Reporter: dons | Owner: simonmar
Type: bug | Status: new
Priority: normal | Milestone: 6.8.3
Component: Runtime System | Version: 6.8.1
Severity: normal | Resolution:
Keywords: threads, concurrency, performance | Difficulty: Unknown
Testcase: | Architecture: Multiple
Os: Multiple |
-----------------------------------------------+----------------------------
Comment (by simonmar):
I don't see the differences reported. `threadring` runs at exactly the
same speed with 6.6.1 and 6.8.1 here, and `chaneneos` is slightly faster
with 6.8.1. So we have to look at how your GHC was built: for reference
the builds I'm using are
{{{
BeConservative = YES
XMLDocWays=html
PublishCp=rsync
[EMAIL PROTECTED]:/home/haskell/ghc/dist/stable
GhcStage2HcOpts=-DDEBUG -debug
GhcLibHcOpts=-O2 -fasm -dcore-lint -fgenerics
HADDOCK_DOCS=YES
}}}
The only thing that should make a difference in performance relative to
the default build is the `GhcLibHcOpts` line.
These are the build settings used by the nightly builds, and the same
settings are used to build the binary distributions we ship from
haskell.org.
Can someone who is seeing a performance difference give more details:
OS/architecture, GHC build settings (or where you got your binaries from),
gcc version. I'll see if I can reproduce it from that.
In reply to jedbrown: here are the results I get
{{{
> for e in ./ghc-66-O ./ghc-66-O2 ./ghc-68-O ./ghc-68-O2 ; do time $e
7000000 >/dev/null; done
7.97s real 7.96s user 0.01s system 99% $e 7000000 > /dev/null
7.47s real 7.44s user 0.01s system 99% $e 7000000 > /dev/null
6.93s real 6.92s user 0.01s system 100% $e 7000000 > /dev/null
6.89s real 6.85s user 0.02s system 99% $e 7000000 > /dev/null
}}}
This is on x86_64/Linux with gcc 4.1.0.
In reply to j.waldmann: the first result is known, adding `-threaded`
turns on atomic locking for `MVar` operations, see #693. The atomic
operations aren't necessary with `-N1`, so that ticket suggests adding
some conditionals to speed things up in that case.
The second result, namely that adding `-N2` slows things down even more,
is because this test is hard to parallelise. Unless the scheduler manages
to schedule exactly half the ring on each CPU, performance goes down the
drain due to the communication overhead. You can get a modest speedup
using `GHC.Conc.forkOnIO` to fix the threads to CPUs. Here is the version
of the benchmark we have in `nofib/smp/threads004` for testing the
scheduler:
{{{
import Control.Concurrent
import Control.Monad
import System
import GHC.Conc (forkOnIO)
thread :: MVar Int -> MVar Int -> IO ()
thread inp out = do x <- takeMVar inp; putMVar out $! x+1; thread inp out
spawn cur n = do next <- newEmptyMVar
forkOnIO (if (n <= 1000) then 0 else 1) $ thread cur next
return next
main = do n <- getArgs >>= readIO.head
s <- newEmptyMVar
e <- foldM spawn s [1..2000]
f <- newEmptyMVar
forkOnIO 1 $ replicateM n (takeMVar e) >>= putMVar f . sum
replicateM n (putMVar s 0)
takeMVar f
}}}
--
Ticket URL: <http://hackage.haskell.org/trac/ghc/ticket/1889#comment:4>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler_______________________________________________
Glasgow-haskell-bugs mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-bugs