Sergey

Thanks for your various messages. I've explained your results below. 
You are right to say that
it's hard to be sure what optimisations will happen when; arguably
that's a bad shortcoming of functional programming (especially the lazy sort).
Profiling tools help a bit.
I think, though, that you have found some particularly bad cases.
I'd be very interested in other people's war stories.

It would be great to make a "performance debugging manual" for
GHC, but it's hard to find the time to do so.  Any volunteers?

Simon



You have clearly understood why

        foldl1 max [1..n] 

takes lots of both heap and stack... first it builds up a huge
chain of suspended calls to max in the heap; and then when it finds
that it does need to evaluate them it takes a lot of stack to do so.

foldl_strict solves both problems by evaluating the accumulating
parameter as it goes:

        foldl_strict k z [] = z
        foldl_strict k z (x:xs) = let z1 = k z x
                                  in seq z1 (foldl_strict k z1 xs)

(I prefer "seq" to "strict", but its a matter of taste. Haskell provides
both.)  The seq forces z1 to be evaluated before the recursive call.

* Why isn't maximum in the Prelude defined using foldl_strict?  No
  reason: it should be.

* Why do sum2 and sum3 behave differently?

        sum2 xs = sum' 0 xs
                where
                   sum' s [] = s
                   sum' s (x:xs) = sum' (s+x) xs

        sum3 xs = foldl (+) 0 xs

Well, sum3 behaves badly for the same reason as before.  sum2 behaves
well because the compiler can see that sum' is strict in its first
argument; that is, it can see that sum' will eventually evaluate s.
That's what the strictness analyser does.  Having figured out that
the first argument of sum' is sure to be evaluated in the end, 
the compiler arranges to evaluate the first argument of sum' before
the call --- which gives us back something rather like foldl_strict,
and thus good performance.

Why can't the compiler spot that for sum3?  Because in general
foldl does not necessarily evaluate its second argument.

How can we get the good behaviour for sum3?  By defining foldl like
this:

        {-# INLINE foldl #-}
        foldl k z xs = f z xs 
                      where
                        f z [] = z
                        f z (x:xs) = f (k z x) xs

Now sum3's right-hand side will turn into sum2's right-hand side
(after inlining foldl) and all will be fine.  (If you use -O.)
The only down side is that some calls to foldl won't benefit
from the inlining and for those cases the code size might go up
a bit.  We should probably make this change to the Prelude.


* Why did you get different behaviour for sum1/sum2/sum3 etc
when you split the program over two modules?  I think it's 
because you didn't put in any type signatures.   When separately
compiled, ghc generated overloaded functions, which weren't
even sure that "+" is strict.  When compiled as one, it could
see that there was only one call to sumx, inline it at the
call site, and thereby discover which "+" you meant.

* Why does your call
        foldStrict addR (R 0 b) ss
take lots of space?

foldlStrict  _ z []     =  z
foldlStrict  f z (x:xs) =  let  y = f z x  in 
                                      y `seq` (foldlStrict f y xs)


data  R = R Int Int

addR (R x b) (R y _) =  R (x+y) b 


Reason: R is a lazy constructor.  So the addR calls get done 
strictly (by foldStrict), but the (x+y) expressions don't get
evaluated because R is lazy.  You can fix this (as you did) by
declaring R to be strict, or by putting a seq in addR:

addR (R x b) (R y _) =  let z = x+y in
                        z `seq` R z b 

That, in fact, is what happens when you put an "!" in R's declaration.




Reply via email to