[Haskell-cafe] Reifying case expressions [was: Re: On stream processing, and a new release of timeplot coming]

Heinrich Apfelmus Sun, 25 Dec 2011 02:26:53 -0800

Eugene Kirpichov wrote:

In the last couple of days I completed my quest of making my graphing
utility timeplot ( http://jkff.info/software/timeplotters ) not load the
whole input dataset into memory and consequently be able to deal with
datasets of any size, provided however that the amount of data to *draw* is
not so large. On the go it also got a huge speedup - previously visualizing
a cluster activity dataset with a million events took around 15 minutes and
a gig of memory, now it takes 20 seconds and 6 Mb max residence.
(I haven't yet uploaded to hackage as I have to give it a bit more testing)


The refactoring involved a number of interesting programming patterns that
I'd like to share with you and ask for feedback - perhaps something can be
simplified.

The source is at http://github.com/jkff/timeplot

The datatype of incremental computations is at
https://github.com/jkff/timeplot/blob/master/Tools/TimePlot/Incremental.hs .
Strictness is extremely important here - the last memory leak I eliminated
was lack of bang patterns in teeSummary.

Your StreamSummary type has a really nice interpretation: it's areification of case expressions.


For instance, consider the following simple function from lists to integers

    length :: [a] -> Int
    length xs = case xs of
        []     -> 0
        (y:ys) -> 1 + length ys

We want to reify the case expression as constructor of a data type. Whattype should it have? Well, a case expression maps a list xs to aresult, here of type Int, via two cases: the first case gives a resultand the other maps a value of type a to a function from lists toresults again. This explanation was probably confusing, so I'll just goahead and define a data type that represents functions from lists [a]to some result of type r


    data ListTo a r = CaseOf r (a -> ListTo a r)

    interpret :: ListTo a r -> ([a] -> r)
    interpret (CaseOf nil cons) xs =
        case xs of
            []     -> nil
            (y:ys) -> interpret (cons y) ys

As you can see, we are just mapping each CaseOf constructor back to abuilt-in case expression.

Likewise, each function from lists can be represented in terms of ournew data type: simply replace all built-in case expressions with the newconstructor


    length' :: ListTo a Int
    length' = CaseOf
        (0)
        (\x -> fmap (1+) length')

    length = interpret length'

The CaseOf may look a bit weird, but it's really just a straightforwardtranslation of the case expression you would use to define the functiongo instead.

Ok, this length function is really inefficient because it builds a hugeexpression of the form (1+(1+...)). Let's implement a strict variantinstead


    lengthL :: ListTo a Int
    lengthL = go 0
        where
        go !n = CaseOf (n) (\x -> go (n+1))

While we're at it, let's translate two more list functions

    foldL' :: (b -> a -> b) -> b -> ListTo a b
    foldL' f b = Case b (\a -> foldL' f $! f b a)

    sumL    :: ListTo Int Int
    sumL    = foldL' (\b a -> a+b) 0

And now we can go for the point of this message: unlike ordinaryfunctions from lists, we can compose these in lock-step! In particular,the following applicative instance


    instance Applicative (ListTo a) where
        pure b = CaseOf b (const $ pure b)
        (CaseOf f fs) <*> (CaseOf x xs) =
            CaseOf (f x) (\a -> fs a <*> xs a)

allows us to write a function

    average :: ListTo Int Double
    average = divide <$> sumL <*> lengthL
        where
        divide a b = fromIntegral a / fromIntegral b

that runs in constant space! Why does this work? Well, since we can nowinspect case expressions, we can choose to evaluate them in lock-step,essentially computing sum and length with just one pass over theinput list. Remember that the original definition


    average xs = sum xs / length xs

has a space leak because the input list xs is being shared.


Remarks:

1. Reified case expressions are, of course, the same thing as Iteratees,modulo chunking and weird naming.

2. My point is topped by scathing irony: if Haskell had a form of*partial evaluation*, we could write applicative combinators for*ordinary* functions [a] -> r and express average in constant space.In other words, partial evaluation would make it unnecessary to reifycase expressions for the purpose of controlling performance / space leaks.



Best regards,
Heinrich Apfelmus

--
http://apfelmus.nfshost.com


_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

[Haskell-cafe] Reifying case expressions [was: Re: On stream processing, and a new release of timeplot coming]

Reply via email to