On Thu, Feb 26, 2009 at 6:23 PM, Don Stewart <d...@galois.com> wrote: > But note the lazy list of Double pairs, so the inner loop still looks like > this though: > > $wlgo :: Int# -> [(Double, Double)] -> Int > > $wlgo = > \ (ww_s1pv :: Int#) > (w_s1px :: [(Double, Double)]) -> > case w_s1px of wild_aTl { > [] -> I# ww_s1pv; > : x_aTp xs_aTq -> > case x_aTp of wild1_B1 { (x1_ak3, y_ak5) -> > case x1_ak3 of wild2_aX8 { D# x2_aXa -> > case y_ak5 of wild3_XYs { D# x3_XYx -> > case <=## > (sqrtDouble# > (+## > (*## x2_aXa x2_aXa) (*## x3_XYx x3_XYx))) > 1.0 > of wild4_X1D { > False -> $wlgo ww_s1pv xs_aTq; > True -> $wlgo (+# ww_s1pv 1) xs_aTq > } > > while we want to keep everything in registers with something like: > > Int# -> Double# -> Double# -> Int# > > So we'll be paying a penalty to force the next elem of the list (instead of > just calling the Double generator). This definitely has an impact on > performance. > > $ ghc-core B.hs -O2 -fvia-C -optc-O3 -fexcess-precision -optc-march=core2 > -funbox-strict-fields > > $ time ./B 10000000 > 3.1407688 > ./B 10000000 2.41s user 0.01s system 99% cpu 2.415 total > > > Now, what if we just rewrote that inner loop directly to avoid intermediate > stuff? That'd give > us a decent lower bound. > > {-# LANGUAGE BangPatterns #-} > > import System.Environment > import System.Random.Mersenne > > isInCircle :: Double -> Double -> Bool > isInCircle x y = sqrt (x*x + y*y) <= 1.0 > > countHits :: Int -> IO Int > countHits lim = do > g <- newMTGen Nothing > let go :: Int -> Int -> IO Int > go !throws !hits > | throws >= lim = return hits > | otherwise = do > x <- random g -- use mersenne-random-pure64 to stay pure! > y <- random g > if isInCircle x y > then go (throws+1) (hits+1) > else go (throws+1) hits > go 0 0 > > monteCarloPi :: Int -> IO Double > monteCarloPi n = do > hits <- countHits n > return $ 4.0 * fromIntegral hits / fromIntegral n > > main = do > [n] <- getArgs > res <- monteCarloPi (read n) > print res > > And now the inner loop looks like: > > $wa_s1yW :: Int# > -> Int# > -> State# RealWorld > -> (# State# RealWorld, Int #) > > Pretty good. Can't avoid the Int boxed return (and resulting heap check) due > to use of IO monad. > But at least does away with heap allocs in the inner loop! > > How does it go: > > $ ghc-core A.hs -O2 -fvia-C -optc-O3 -fexcess-precision -optc-march=core2 > -funbox-strict-fields > > $ time ./A 10000000 > 3.1412564 > ./A 10000000 0.81s user 0.00s system 99% cpu 0.818 total > > Ok. So 3 times faster. Now the goal is to recover the high level version. > We have many tools to employ: switching to mersenne-random-pure64 might help > here. And seeing if you can fuse filling a uvector with randoms, and folding > over it... t > > -- Don >
Very nice! I also wrote a naive version which used uvector but it was about twice as slow as the original Haskell version. I wanted to write "lengthU . filterU isInCircle" because that clearly expresses the algorithm. Sadly I was at work and didn't have time for profile the program to see what was wrong. Still, I couldn't resist having a go at the problem :-) _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe