RE: loop performance bug

Simon Peyton-Jones Tue, 15 Mar 2005 03:54:09 -0800

I couldn't resist looking into this.  Here's the story.  All of this
applies to the explicitly-recursive program; I have not looked at the
list version.


To preview the headlines:

Original program                                13.1s
1. Use Int instead of Bool                      12.6s
2. Use -fliberate-case-threshold=100    4.8s
3. Give instance with unsafeIndex               1.25s
4. {-# NOINLINE main #-}                        0.85s

Lessons
~~~~~
Going from 13s to 0.85s is big jump. 

A. It would be a big help if someone felt like doing this kind of
detective work in a more systematic way, and producing a list of
recommendations for the implementation.

B. Arising from (3), 'deriving Ix' should define unsafeIndex.  I'll take
that as an action.

C. Maybe the liberate-case threshold should be higher.  I'd be
interested to hear if increasing the threshold has good effects for
anyone else.

D. (4) actually only arises because everything is constant here.  I'm
not super-inclined to fix this dark corner.

Simon


The details
~~~~~~~
1.  IOUArrays are represented in a space-efficient manner.  In
particular, Bool arrays take one bit per element, but that entails lots
of bit-twiddling instructions.  Using Int instead reduces the time from
13.1 to 12.6s on my machine.

2.  The inner loop repeatedly takes apart the array data structure, the
pair representing its bounds, and the Pos structure for each bound, each
time around the loop.  GHC has an optimisation designed specifically to
catch this, called "liberate case" (only run with -O2).  To avoid code
duplication, though, it has a code-size threshold that is too low for
this example.  Try -fliberate-case-threshold=100.  This reduces the time
from 12.6s to 4.8s

3.  Array-bound checks are performed each time around the loop.
Actually, it's much worse than that.  The call
        writeArray arr (Pos x y) 1
turns into      
        unsafeWriteArray arr (index (bounds arr) (Pos x y)) 1

The derived method for index on Pos (generated by 'deriving Ix') looks
like this
        index (Pos l1 h2, Pos l2 h2) i j
          = index (l2,h2) j +
              index (l2,h2) h2 * index (l1,h1) i

And *each* of those calls to 'index (l,h) i' (done at type Int) checks
that i>=l and i<=h.  Result is six range checks for every access!

You might think we could use unsafeIndex instead:
        unsafeWriteArray arr (unsafeIndex (bounds arr) (Pos x y)) 1

but unfortunately the *derived* instance for Ix defines unsafeIndex =
index.  So you can instead give the Ix instance by hand:

instance Ix Pos where
  unsafeIndex (Pos l1 l2, Pos h1 h2) (Pos i j)
    = unsafeIndex (l2,h2) j + 
      (unsafeIndex (l2,h2) h2 + 1) * unsafeIndex (l1,h1) i
  inRange (Pos l1 l2, Pos h1 h2) (Pos i j)
    = inRange (l1,h1) i && inRange (l2,h2) j

This reduces the run-time to 1.25s

4.  Finally, there's a rather obscure problem.  You'd expect that since
the array is explicitly allocated with fixed bounds, the program might
be able to see those bounds when doing its index calculations.  It turns
out that a bad order of inlining stops this happening.  Just to see the
effect, I "fixed" the problem in two ways, both of which give the same
end result:
a) replace the (bounds arr) in the argument to unsafeIndex by (Pos 0 0,
Pos 99 99).
b) add an {-# NOINLINE main #-} pragme

I won't explain (b) -- try -ddump-simpl and see!  Anyway, this "fix"
means that computations like 99+1 can be constant folded, and reduces
the time to 0.85s




| -----Original Message-----
| From: [EMAIL PROTECTED]
[mailto:glasgow-haskell-bugs-
| [EMAIL PROTECTED] On Behalf Of Duncan Coutts
| Sent: 14 March 2005 13:41
| To: GHC-bugs list
| Subject: loop performance bug
| 
| Hi,
| 
| This sort of code runs very slowly when compared to the equivalent in
C:
| 
| > {-# OPTIONS -fglasgow-exts #-}
| > module Main where
| >
| > import Data.Array.MArray
| > import Data.Array.IO
| >
| > data Pos = Pos !Int !Int
| >   deriving (Eq, Ord, Ix)
| >
| > main = test1
| >
| > test1 :: IO ()
| > test1 = do
| >   (arr :: IOUArray Pos Bool) <- newArray_ (Pos 0 0, Pos 99 99)
| >   sequence_ [ sequence_ [ writeArray arr (Pos y x) False
| >                         | y <- [0..99]
| >                         , x <- [0..99] ]
| >             | _ <- [0..9999] ]
| >
| 
| Timing:
| 
| $ ghc --make -O2 SpeedTest.hs -o speedtest-hs
| $ time ./speedtest-hs
| 
| real    0m10.968s
| user    0m10.952s
| sys     0m0.005s
| 
| Comparing to an 'equivalent' C program:
| 
| > int main () {
| >   char arr[100][100];
| >   int n,x,y;
| >
| >   for (n = 0; n != 10000; n++)
| >     for (y = 0; y != 100; y++)
| >       for (x = 0; x != 100; x++)
| >         arr[y][x] = 0;
| > }
| 
| $ gcc -O2 speedtest.c -o speedtest-c
| $ time ./speedtest-c
| 
| real    0m0.129s
| user    0m0.123s
| sys     0m0.000s
| 
| Now parhaps this is unfair sice the Haskell program is written in
terms
| of lists. So lets write it using explicit looping and no lists.
| 
| 
| > {-# INLINE doFromTo #-}
| > -- do the action for [from..to] ,ie it's inclusive.
| > doFromTo :: Int -> Int -> (Int -> IO ()) -> IO ()
| > doFromTo from to action =
| >   let loop n | n > to   = return ()
| >              | otherwise = do action n
| >                               loop (n+1)
| >    in loop from
| >
| > test2 :: IO ()
| > test2 = do
| >   (arr :: IOUArray Pos Bool) <- newArray_ (Pos 0 0, Pos 99 99)
| >   doFromTo 0 9999 $ \_ ->
| >     doFromTo 0 99 $ \y ->
| >       doFromTo 0 99 $ \x ->
| >         writeArray arr (Pos y x) False
| 
| Timing:
| 
| $ ghc --make -O2 SpeedTest.hs -o speedtest-hs
| $ time ./speedtest-hs
| 
| real    0m7.942s
| user    0m7.936s
| sys     0m0.001s
| 
| So not much better really. :-(
| 
| If there's way to write loops that is faster I'd dearly like to know.
| 
| I initially assumed that the second version would run fast since there
| is no obvious need for any memory allocations (which is the usual
| culprit).
| 
| Duncan
| 
| _______________________________________________
| Glasgow-haskell-bugs mailing list
| [email protected]
| http://www.haskell.org/mailman/listinfo/glasgow-haskell-bugs
_______________________________________________
Glasgow-haskell-bugs mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-bugs

RE: loop performance bug

Reply via email to