Hi again, all.

So I rewrote some of the versions, so there are now six versions of the
array normalization code.  They are:

normal:         combination of foldM and mapM_
loop:           a two-pass loop mimicking foldM and mapM_
unboxed-normal: normal on unboxed arrays
unboxed-loop:   loop on unboxed arrays
fix:            using fixIO and a look with Double accumulator
cpsfix:         using fixIO and a CPS accumulator

I ran each of these on arrays of size 100,000, 250,000, 500,000 and
1,000,000 elements.  The results are:

        |       FIXIO        |                  TWO LOOPS
        |  fix       cpsfix  |   map/fold    loop    unboxed-m/f
unboxed-loop
--------+--------------------+--------------------------------------------------
 100000 |  0.71       1.37   |     1.60      1.61       0.90          0.42      
 250000 |  6.51       5.74   |     7.48      7.24       2.90          1.22      
 500000 | 23.88      21.32   |    25.35     26.38       7.51          2.27      
1000000 | 92.72      79.16   |    97.83    105.73      21.78          4.54      

(sorry if that wraps on your screen -- see the attached file)

So, looking at the FIXIO methods, for large arrays, cpsfix seems to
dominate.  There's a small overhead for small arrays, but this is passed
once we get to 250,000 elements (probably much before).

I'm (pleasantly) surprised that both fix and cps fix consistently beat the
two boxed implementations on the right.  I'm shocked that the handwritten
loop version does worse than the map/fold version and cannot explain this.

Yet, when looking at the unboxed arrays, the map/fold version does *much*
worse than the loop version.  I'm guessing this has to do with the fact
that in the loop version the compiler is unboxing the index variable for
the whole loop, rather than unboxing each element in the list for map/fold
one-by-one.

While I'm happy that the fix versions outperform the 2-pass versions for
boxed arrays, the discrepency between 79.16 seconds for one million
elements and 4.54 sectons on the same data is alarming.  Can anyone
suggest a way to reconcile this?

 - Hal

p.s., I've attached the code and results (as comments in the code).


--
Hal Daume III

 "Computer science is no more about computers    | [EMAIL PROTECTED]
  than astronomy is about telescopes." -Dijkstra | www.isi.edu/~hdaume

On Sat, 28 Sep 2002, Levent Erkok wrote:

> On Friday 27 September 2002 05:19 pm, Hal Daume III wrote:
> > There is a one pass solution using mfix.  This looks like:
> >
> >   mfix (\ ~s -> ioaA a 1 s)
> >
> > where  ioaA is defined as:
> >
> >   ioaA a i s
> >
> >     | i > sz = return 0
> >     | otherwise =
> >
> >         do s' <- ioaA a (i+1) s
> >            v  <- readArray a i
> >            writeArray a i (v/s)
> >            return (s' + v)
> >     where (_,sz) = Data.Array.IO.bounds a
> >
> > Using unboxed arrays is not possible in the fixed point version
> > (loop).  On the normal version, it just about triples the speed across the
> > board.
> 
> Hi,
> 
> I'm not sure if it's mfix we should blame here. Can you please 
> experiment with the following version of normalize and report on 
> running times?
> 
>   norm a = mdo s <- ioaA 1 s id
>                return ()
>     where (_, sz) = bounds a
>           ioaA i s acc
>            | i > sz = return (acc 0)
>            | True   =
>                do v <- readArray a i
>                   writeArray a i (v/s)
>                   ioaA (i+1) s (\x -> v + acc x)
> 
> It'd be interesting to see the results especially for very large 
> inputs (i.e. >= half a million elements.)
> 
> -Levent.
> 

module Main where

import Data.Array
import Data.Array.IO
import Control.Monad
import Control.Monad.Fix
import System


ioaA a s = ioaA' 1 s 0
    where ioaA' i s acc
              | i > sz = return acc
              | True   = do v <- readArray a i
                            writeArray a i (v / s)
                            ioaA' (i+1) s $! (v + acc)
          (_, sz) = Data.Array.IO.bounds a

ioaACPS a s = ioaACPS' 1 s id
    where (_, sz) = Data.Array.IO.bounds a
          ioaACPS' i s acc
           | i > sz = return (acc 0)
           | True   =
               do v <- readArray a i
                  writeArray a i (v/s)
                  ioaACPS' (i+1) s (\x -> v + acc x)


main = do
  [method,n] <- getArgs
  (a::IOArray Int Double) <- newListArray (1::Int,read n) [(1::Double)..read n]
  if method == "fix"
    then mfix (\ ~s -> ioaA a s) >> return ()
    else if method == "cps"
           then mfix (\ ~s -> ioaACPS a s) >> return ()
           else if method == "loop"
                  then normLoop a
                  else norm a
  return ()

norm a = do
  t <- foldM (\t i -> (t+) `liftM` readArray a i) 0 [1..sz]
  mapM_ (\i -> readArray a i >>= writeArray a i . (/t)) [1..sz]
  where (_,sz) = Data.Array.IO.bounds a

normLoop a = do t <- normLoop' 1 0
                normLoop'' 1 t
    where normLoop' i acc
              | i > sz = return acc
              | True   = do v <- readArray a i
                            normLoop' (i+1) $! (v + acc)
          normLoop'' i t
              | i > sz = return ()
              | True   = do v <- readArray a i
                            writeArray a i (v/t)
                            normLoop'' (i+1) t
          (_, sz) = Data.Array.IO.bounds a

{-
        |       FIXIO        |                  TWO LOOPS
        |  fix       cpsfix  |   map/fold    loop    unboxed-m/f  unboxed-loop
--------+--------------------+--------------------------------------------------
 100000 |  0.71       1.37   |     1.60      1.61       0.90          0.42      
 250000 |  6.51       5.74   |     7.48      7.24       2.90          1.22      
 500000 | 23.88      21.32   |    25.35     26.38       7.51          2.27      
1000000 | 92.72      79.16   |    97.83    105.73      21.78          4.54      
-}

Reply via email to