Juan Jose Garcia Ripoll <[EMAIL PROTECTED]> wrote:
> can anybody point me to tutorials, papers, etc, on how to properly
> annotate strictness in Haskell code? I am concerned with the following
> stupid piece of code that eats a lot of memory and takes an incredible
> amount of time to produce some output. I hope somebody will help me in
> finding what I am doing wrong.
I don't know of any general techniques, but there are a few
specific rules of thumb that apply here:
In strict languages like ML and Scheme, "foldl f" (or the
equivalent thereof) is preferable to "foldr f" (when given
the choice, i.e., if "f : a->a->a" is associative), since
"foldl" is tail-recursive. In lazy languages like Haskell,
"foldr" is (usually) preferable, since it's lazier -- if "f"
is lazy, then "foldr f z l" only needs to evaluate the first
term of "l", whereas "foldl f z l" needs to force all
of "l" before it can proceed. If "f" is strict, however,
both "foldr f z l" and "foldl f z l" will take up heap space
proportional to the length of "l", but "foldl'" -- the strict
version of "foldl" -- only takes constant space.
So the rule of thumb is: if "f" is associative and lazy,
use "foldr f". If "f" is associative and strict, use "foldl' f".
In general, try to avoid "foldl".
> produce :: Int -> Double -> Array Int Double
> produce n x = array (1,n) [(i,x) | i <- [1..n]]
>
> scprod :: Array Int Double -> Array Int Double -> Double
> scprod a b =
> case (bounds a, bounds b) of
> ((1,i), (1,j)) ->
> foldl (+) start [a!(x) * b!(x) | x <- [2..i]]
> where start = a!(1) * b!(1)
>
> main = print (show (scprod a a))
> where a = produce 1000000 1.0
Replacing "foldl" with "foldl'" in your program doesn't fix
the whole problem though. I suspect that the use of Arrays
is the culprit. Just trying to evaluate
(produce 1000000 1.0) ! 1
causes Hugs to run out of heap almost immediately on my machine,
and
(produce 10000 1.0) ! 1
succeeds, but runs for several seconds and 8 garbage collections
before returning an answer. This is because "array bds l" has
to fully evaluate the spine of "l", plus the first member
of each element of "l", before it can be subscripted. Moreover,
it leaves the second member of each element of "l" unevaluated
until it's requested, so the space behaviour of "array" is
particularly bad. The second rule of thumb, then, is
"avoid Haskell Arrays unless you really, really need
constant-time random access."
In this particular problem, 'scprod' consumes elements in
sequential order, so it may be better to use lists instead
of arrays. (In fact "scprod a a where a = produce n 1.0"
has a closed-form, O(1) solution, but I assume that's not
the problem you're really trying to solve :-)
Hope this helps,
--Joe English
[EMAIL PROTECTED]