Hi again, all. So I rewrote some of the versions, so there are now six versions of the array normalization code. They are:
normal: combination of foldM and mapM_ loop: a two-pass loop mimicking foldM and mapM_ unboxed-normal: normal on unboxed arrays unboxed-loop: loop on unboxed arrays fix: using fixIO and a look with Double accumulator cpsfix: using fixIO and a CPS accumulator I ran each of these on arrays of size 100,000, 250,000, 500,000 and 1,000,000 elements. The results are: | FIXIO | TWO LOOPS | fix cpsfix | map/fold loop unboxed-m/f unboxed-loop --------+--------------------+-------------------------------------------------- 100000 | 0.71 1.37 | 1.60 1.61 0.90 0.42 250000 | 6.51 5.74 | 7.48 7.24 2.90 1.22 500000 | 23.88 21.32 | 25.35 26.38 7.51 2.27 1000000 | 92.72 79.16 | 97.83 105.73 21.78 4.54 (sorry if that wraps on your screen -- see the attached file) So, looking at the FIXIO methods, for large arrays, cpsfix seems to dominate. There's a small overhead for small arrays, but this is passed once we get to 250,000 elements (probably much before). I'm (pleasantly) surprised that both fix and cps fix consistently beat the two boxed implementations on the right. I'm shocked that the handwritten loop version does worse than the map/fold version and cannot explain this. Yet, when looking at the unboxed arrays, the map/fold version does *much* worse than the loop version. I'm guessing this has to do with the fact that in the loop version the compiler is unboxing the index variable for the whole loop, rather than unboxing each element in the list for map/fold one-by-one. While I'm happy that the fix versions outperform the 2-pass versions for boxed arrays, the discrepency between 79.16 seconds for one million elements and 4.54 sectons on the same data is alarming. Can anyone suggest a way to reconcile this? - Hal p.s., I've attached the code and results (as comments in the code). -- Hal Daume III "Computer science is no more about computers | [EMAIL PROTECTED] than astronomy is about telescopes." -Dijkstra | www.isi.edu/~hdaume On Sat, 28 Sep 2002, Levent Erkok wrote: > On Friday 27 September 2002 05:19 pm, Hal Daume III wrote: > > There is a one pass solution using mfix. This looks like: > > > > mfix (\ ~s -> ioaA a 1 s) > > > > where ioaA is defined as: > > > > ioaA a i s > > > > | i > sz = return 0 > > | otherwise = > > > > do s' <- ioaA a (i+1) s > > v <- readArray a i > > writeArray a i (v/s) > > return (s' + v) > > where (_,sz) = Data.Array.IO.bounds a > > > > Using unboxed arrays is not possible in the fixed point version > > (loop). On the normal version, it just about triples the speed across the > > board. > > Hi, > > I'm not sure if it's mfix we should blame here. Can you please > experiment with the following version of normalize and report on > running times? > > norm a = mdo s <- ioaA 1 s id > return () > where (_, sz) = bounds a > ioaA i s acc > | i > sz = return (acc 0) > | True = > do v <- readArray a i > writeArray a i (v/s) > ioaA (i+1) s (\x -> v + acc x) > > It'd be interesting to see the results especially for very large > inputs (i.e. >= half a million elements.) > > -Levent. >
module Main where import Data.Array import Data.Array.IO import Control.Monad import Control.Monad.Fix import System ioaA a s = ioaA' 1 s 0 where ioaA' i s acc | i > sz = return acc | True = do v <- readArray a i writeArray a i (v / s) ioaA' (i+1) s $! (v + acc) (_, sz) = Data.Array.IO.bounds a ioaACPS a s = ioaACPS' 1 s id where (_, sz) = Data.Array.IO.bounds a ioaACPS' i s acc | i > sz = return (acc 0) | True = do v <- readArray a i writeArray a i (v/s) ioaACPS' (i+1) s (\x -> v + acc x) main = do [method,n] <- getArgs (a::IOArray Int Double) <- newListArray (1::Int,read n) [(1::Double)..read n] if method == "fix" then mfix (\ ~s -> ioaA a s) >> return () else if method == "cps" then mfix (\ ~s -> ioaACPS a s) >> return () else if method == "loop" then normLoop a else norm a return () norm a = do t <- foldM (\t i -> (t+) `liftM` readArray a i) 0 [1..sz] mapM_ (\i -> readArray a i >>= writeArray a i . (/t)) [1..sz] where (_,sz) = Data.Array.IO.bounds a normLoop a = do t <- normLoop' 1 0 normLoop'' 1 t where normLoop' i acc | i > sz = return acc | True = do v <- readArray a i normLoop' (i+1) $! (v + acc) normLoop'' i t | i > sz = return () | True = do v <- readArray a i writeArray a i (v/t) normLoop'' (i+1) t (_, sz) = Data.Array.IO.bounds a {- | FIXIO | TWO LOOPS | fix cpsfix | map/fold loop unboxed-m/f unboxed-loop --------+--------------------+-------------------------------------------------- 100000 | 0.71 1.37 | 1.60 1.61 0.90 0.42 250000 | 6.51 5.74 | 7.48 7.24 2.90 1.22 500000 | 23.88 21.32 | 25.35 26.38 7.51 2.27 1000000 | 92.72 79.16 | 97.83 105.73 21.78 4.54 -}