folds and heap space [was: what is leaking?]

Patrik Jansson Sat, 27 Jun 1998 17:22:43 +0200 (MET DST)
On Fri, 26 Jun 1998, S. Alexander Jacobson wrote:

 | > myFoldl f s [] = s
 | > myFoldl f s (x:xs) = f s (foldl f x xs)
 | > total = 3 + (myFoldl (+) 3 [4..100000])
 | > total = 3 + 4 + (myFoldl (+) 4 [5..100000])

Just to avoid confusion: the function called myFoldl above is called
foldr in the Haskell prelude
  http://haskell.org/onlinereport/standard-prelude.html#$vfoldr

With foldr, if the supplied operator needs to evaluate its second
argument to do any real work (as in the case for (+)) a _very_ big
expression will be built before any real evaluation starts.
On the other hand, if the operator does work already when provided with
one argument, the caller can benefit from this result immediately.
Bad:  foldr (+)                   0  [1..100000]
Good: foldr (\x xs -> (x+1) : xs) [] [1..100000]

The real Haskell prelude foldl:
   foldl f z []     =  z
   foldl f z (x:xs) =  foldl f (f z x) xs
uses its second arg. as an accumulating parameter, so that even if f
really wants its second parameter to evaluate, that argument is present
immediately (the expression (f z x) on the RHS above).
The problem here is that when the standard lazy evaluation pull unfolds
the application of foldl f z it will only form the application (f z x)
without actually performing it. Thus we get just as big an
unevaluated expression here as in the foldr case. In some sense foldl
is worse than foldr, as no partial results can be obtained even if the
operator is productive with just one argument present. Only the "Bad" case
above applies - not the "Good". (The trick of supplying an operator that
is explicitly strict does not help either.)

Solution (as already suggested in a previous mail):
In hugs the function foldl'
  foldl'           :: Eval a => (a -> b -> a) -> a -> [b] -> a
  foldl' f a []     = a
  foldl' f a (x:xs) = strict (foldl' f) (f a x) xs
is a variant of foldl that really evaluates the application (f a x)
immediately. With this definition the summation above works fine:
Good: foldl' (+) 0 [1..100000]

Note:
  In favor of foldl: A strictness analyser phase might detect the problem
    and (effectively) transform to foldl'. 
[Implementors: does any implementation do this?]

In practice (see below) only foldl' is reasonable in this specific case.

Patrik Jansson

Quick experiment:
  I tried the three variants with the four widely available Haskell 
    implementations:

{-
foldl'           :: Eval a => (a -> b -> a) -> a -> [b] -> a
foldl' f a []     = a
foldl' f a (x:xs) = strict (foldl' f) (f a x) xs
main = print (foldl' (+) 0 [1..100000])

hugs: 7 s
hbc: 0.5 s
ghc: 0.5 s
nhc: 16 s
-}

{-
main = print (foldl (+) 0 [1..100000])

ghc: needs +RTS -K5M  (5Mb stack space)
hbc: needs -A300K  (300Kb pointer stack)
hugs: Segmentation fault
nhc: needs +RTS -H1500K (1.5M heap)
-}

{-
main = print (foldr (+) 0 [1..100000])

ghc: needs +RTS -K5M  (5M stack space)
hbc: needs -A300K  (300Kb pointer stack)
hugs: Control stack overflow
nhc: needs +RTS -H1500K (1.5M heap)
-}
folds and heap space [was: what is leaking?]

Reply via email to