Thanks! I have downloaded tool and playing with it.I will use boxed vectors in the future ;)
> Date: Sat, 1 Dec 2012 13:22:47 -0500 > Subject: Re: [Haskell-cafe] Why is boxed mutable array so slow? > From: don...@gmail.com > To: bm...@hotmail.com > CC: haskell-cafe@haskell.org > > Regarding when to use mutable arrays versus vectors, I would always > use vectors -- they optimize better, and have a better interface. > > Also, I have updated and released a new version of the tool mentioned below. > You can get it on Hackage, updated to ghc 7 series. > > http://hackage.haskell.org/package/ghc-gc-tune-0.3 > > For your boxed vector program, we get results that show a clear > performance peak with a -A of around 64M, about the size of the > allocated array ... > > http://i.imgur.com/dZ2Eo.png > > Best settings for Running time: > 0.16s: +RTS -A67108864 -H1048576 > 0.16s: +RTS -A67108864 -H2097152 > 0.16s: +RTS -A67108864 -H8388608 > > E.g. > > $ time ./A +RTS -A67M -H1M > boxed vector > last 945735787 seconds 0.123 > > -- Don > > On Sat, Dec 1, 2012 at 12:20 PM, Don Stewart <don...@gmail.com> wrote: > > The obvious difference between boxed and unboxed arrays is that the > > boxed arrays are full of pointers to heap allocated objects. This > > means you pay indirection to access the values, much more time in GC > > spent chasing pointers (though card marking helps), and generally do > > more allocation. > > > > Compare the GC stats below, for > > > > * Boxed vector: 88M bytes copied; 75% of time in GC, 0.472s > > * Unboxed vector: 11k bytes copied, 1.3% of time in GC, 0.077s > > > > So there's your main answer. The increased data density of unboxed > > arrays also helps a too. > > > > Now, you can help out the GC signifcantly by hinting at how much > > you're going to allocated in the youngest generation (see the > > ghc-gc-tune app for a methodical approach to this, though it needs > > updating to ghc 7 -- > > http://donsbot.wordpress.com/2010/07/05/ghc-gc-tune-tuning-haskell-gc-settings-for-fun-and-profit/ > > and > > http://stackoverflow.com/questions/3171922/ghcs-rts-options-for-garbage-collection > > ). > > > > Use the +RTS -A flag to set an initial youngest generation heap size > > to the size of your array, and watch the GC cost disappear. For our > > boxed vector, we'd use +RTS -A50M, resulting in: > > > > * Boxed vector: 8k copied, 1% of time in GC, 0.157s > > > > So not bad. 3x speedup through a RTS flag. -A is very useful if you > > are working with boxed, mutable arrays. > > > > For reference, there's a generic version below that specializes based > > on the vector type parameter. > > > > --------------------------------- > > > > {-# LANGUAGE BangPatterns #-} > > > > import System.CPUTime > > import Text.Printf > > import Data.Int > > import Control.DeepSeq > > import System.Mem > > > > import qualified Data.Vector.Mutable as V > > import qualified Data.Vector.Unboxed.Mutable as U > > import qualified Data.Vector.Generic.Mutable as G > > > > main :: IO() > > main = do > > > > -- (G.new n' :: IO (V.IOVector Int32)) >>= test' "boxed vector" > > -- performGC > > (G.new n' :: IO (U.IOVector Int32)) >>= test' "unboxed vector" > > performGC > > > > test' s a = do > > putStrLn s > > begin <- getCPUTime > > init'' a > > partial_sum' a > > end <- getCPUTime > > let diff = (fromIntegral (end - begin)) / (10**12) > > last <- G.read a (n'-1) > > printf "last %d seconds %.3f\n" last (diff::Double) > > > > n' :: Int > > n' = 1000 * 1000 > > > > init'' !a = init 0 (n'-1) > > where > > init :: Int -> Int -> IO () > > init !k !n > > | k > n = return () > > | otherwise = do > > let !x = fromIntegral $ k + k `div` 3 > > G.write a k x > > init (k+1) n > > > > > > > > partial_sum' !a = do > > k <- G.read a 0 > > ps 1 (n'-1) k > > where > > ps :: Int -> Int -> Int32 -> IO () > > ps i n s > > | i > n = return () > > | otherwise = do > > k <- G.read a i > > let !l = fromIntegral $ s + k > > G.write a i l > > ps (i+1) n l > > > > > > --------------------------------- > > > > $ time ./A +RTS -s > > boxed vector > > last 945735787 seconds 0.420 > > 40,121,448 bytes allocated in the heap > > 88,355,272 bytes copied during GC > > 24,036,456 bytes maximum residency (6 sample(s)) > > 380,632 bytes maximum slop > > 54 MB total memory in use (0 MB lost due to fragmentation) > > > > %GC time 75.2% (75.9% elapsed) > > > > Alloc rate 359,655,602 bytes per MUT second > > > > ./A +RTS -s 0.40s user 0.07s system 98% cpu 0.475 total > > > > > > $ time ./A +RTS -s > > unboxed vector > > last 945735787 seconds 0.080 > > 4,113,568 bytes allocated in the heap > > 11,288 bytes copied during GC > > 4,003,256 bytes maximum residency (3 sample(s)) > > 182,856 bytes maximum slop > > 5 MB total memory in use (0 MB lost due to fragmentation) > > > > %GC time 1.3% (1.3% elapsed) > > > > Alloc rate 51,416,660 bytes per MUT second > > > > ./A +RTS -s 0.08s user 0.01s system 98% cpu 0.088 total > > > > > > $ time ./A +RTS -A50M -s > > boxed vector > > last 945735787 seconds 0.127 > > 40,121,504 bytes allocated in the heap > > 8,032 bytes copied during GC > > 44,704 bytes maximum residency (2 sample(s)) > > 20,832 bytes maximum slop > > 59 MB total memory in use (0 MB lost due to fragmentation) > > > > %GC time 1.0% (1.0% elapsed) > > > > Productivity 97.4% of total user, 99.6% of total elapsed > > > > ./A +RTS -A50M -s 0.10s user 0.05s system 97% cpu 0.157 total > > > > > > > > --------------------------------- > > > > > > On Sat, Dec 1, 2012 at 11:09 AM, Branimir Maksimovic <bm...@hotmail.com> > > wrote: > >> I have made benchmark test inspired by > >> http://lemire.me/blog/archives/2012/07/23/is-cc-worth-it/ > >> > >> What surprised me is that unboxed array is much faster than boxed array. > >> Actually boxed array performance is on par with standard Haskell list > >> which is very slow. > >> All in all even unboxed array is about 10 times slower than Java version. > >> I don't understand why is even unboxed array so slow. > >> But! unboxed array consumes least amount of RAM. > >> (warning, program consumes more than 3gb of ram) > >> > >> bmaxa@maxa:~/examples$ time ./Cumul > >> boxed array > >> last 262486571 seconds 4.972 > >> unboxed array > >> last 262486571 seconds 0.776 > >> list > >> last 262486571 seconds 6.812 > >> > >> real 0m13.086s > >> user 0m11.996s > >> sys 0m1.080s > >> > >> ------------------------------------------------------------------------- > >> {-# LANGUAGE CPP, BangPatterns #-} > >> import System.CPUTime > >> import Text.Printf > >> import Data.Array.IO > >> import Data.Array.Base > >> import Data.Int > >> import Control.DeepSeq > >> import System.Mem > >> > >> main :: IO() > >> main = do > >> (newArray_ (0,n'-1) :: IO(A)) >>= test "boxed array" > >> performGC > >> (newArray_ (0,n'-1) :: IO(B)) >>= test "unboxed array" > >> performGC > >> begin <- getCPUTime > >> printf "list\nlast %d" $ last $ force $ take n' $ sum' data' > >> end <- getCPUTime > >> let diff = (fromIntegral (end - begin)) / (10^12) > >> printf " seconds %.3f\n" (diff::Double) > >> > >> test s a = do > >> putStrLn s > >> begin <- getCPUTime > >> init' a > >> partial_sum a > >> end <- getCPUTime > >> let diff = (fromIntegral (end - begin)) / (10^12) > >> last <- readArray a (n'-1) > >> printf "last %d seconds %.3f\n" last (diff::Double) > >> > >> n' :: Int > >> n' = 50 * 1000 * 1000 > >> > >> type A = IOArray Int Int32 > >> type B = IOUArray Int Int32 > >> > >> init' a = do > >> (_,n) <- getBounds a > >> init a 0 n > >> where > >> init a k n > >> | k > n = return () > >> | otherwise = do > >> let !x = fromIntegral $ k + k `div` 3 > >> unsafeWrite a k x > >> init a (k+1) n > >> > >> partial_sum a = do > >> (_,n) <- getBounds a > >> k <- unsafeRead a 0 > >> ps a 1 n k > >> where > >> ps a i n s > >> | i > n = return () > >> | otherwise = do > >> k <- unsafeRead a i > >> let !l = fromIntegral $ s + k > >> unsafeWrite a i l > >> ps a (i+1) n l > >> > >> data' :: [Int32] > >> data' = [k + k `div` 3 | k <- [0..] ] > >> > >> sum' = scanl1 (+) > >> > >> > >> _______________________________________________ > >> Haskell-Cafe mailing list > >> Haskell-Cafe@haskell.org > >> http://www.haskell.org/mailman/listinfo/haskell-cafe > >>
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe