Hello Heinrich, Thanks, that's sure some food for thought!
A few notes: * This is indeed analogous to Iteratees. I tried doing the same with Iteratees but failed, so I decided to put together something simple of my own. * The Applicative structure over this stuff is very nice. I was thinking, what structure to put on - and Applicative seems the perfect fit. It's also possible to implement Arrows - but I once tried and failed; however, I was trying that for a more complex stream transformer datatype (a hybrid of Iteratee and Enumerator). * StreamSummary is trivially a bifunctor. I actually wanted to make it an instance of Bifunctor, but it was in the category-extras package and I hesitated to reference this giant just for this purpose :) Probably bifunctors should be in prelude. * Whereas StreamSummary a r abstracts deconstruction of lists, the dual datatype (StreamSummary a r ->) abstracts construction; however I just now (after looking at your first definition of length) understood that it is trivially isomorphic to the regular list datatype - you just need to be non-strict in the state - listify :: ListTo a [a] = CaseOf [] (\x -> fmap (x:) listify). So you don't need functions of the form (forall r . ListTo a r -> ListTo b r) - you just need (ListTo b [a]). This is a revelation for me. On Sun, Dec 25, 2011 at 2:25 PM, Heinrich Apfelmus < apfel...@quantentunnel.de> wrote: > Eugene Kirpichov wrote: > >> In the last couple of days I completed my quest of making my graphing >> utility timeplot ( >> http://jkff.info/software/**timeplotters<http://jkff.info/software/timeplotters>) >> not load the >> whole input dataset into memory and consequently be able to deal with >> datasets of any size, provided however that the amount of data to *draw* >> is >> not so large. On the go it also got a huge speedup - previously >> visualizing >> a cluster activity dataset with a million events took around 15 minutes >> and >> a gig of memory, now it takes 20 seconds and 6 Mb max residence. >> (I haven't yet uploaded to hackage as I have to give it a bit more >> testing) >> >> The refactoring involved a number of interesting programming patterns that >> I'd like to share with you and ask for feedback - perhaps something can be >> simplified. >> >> The source is at >> http://github.com/jkff/**timeplot<http://github.com/jkff/timeplot> >> >> The datatype of incremental computations is at >> https://github.com/jkff/**timeplot/blob/master/Tools/** >> TimePlot/Incremental.hs<https://github.com/jkff/timeplot/blob/master/Tools/TimePlot/Incremental.hs>. >> Strictness is extremely important here - the last memory leak I eliminated >> was lack of bang patterns in teeSummary. >> > > Your StreamSummary type has a really nice interpretation: it's a > reification of case expressions. > > For instance, consider the following simple function from lists to integers > > length :: [a] -> Int > length xs = case xs of > [] -> 0 > (y:ys) -> 1 + length ys > > We want to reify the case expression as constructor of a data type. What > type should it have? Well, a case expression maps a list xs to a result, > here of type Int, via two cases: the first case gives a result and the > other maps a value of type a to a function from lists to results again. > This explanation was probably confusing, so I'll just go ahead and define a > data type that represents functions from lists [a] to some result of type > r > > data ListTo a r = CaseOf r (a -> ListTo a r) > > interpret :: ListTo a r -> ([a] -> r) > interpret (CaseOf nil cons) xs = > case xs of > [] -> nil > (y:ys) -> interpret (cons y) ys > > As you can see, we are just mapping each CaseOf constructor back to a > built-in case expression. > > Likewise, each function from lists can be represented in terms of our new > data type: simply replace all built-in case expressions with the new > constructor > > length' :: ListTo a Int > length' = CaseOf > (0) > (\x -> fmap (1+) length') > > length = interpret length' > > The CaseOf may look a bit weird, but it's really just a straightforward > translation of the case expression you would use to define the function go > instead. > > Ok, this length function is really inefficient because it builds a huge > expression of the form (1+(1+...)). Let's implement a strict variant > instead > > lengthL :: ListTo a Int > lengthL = go 0 > where > go !n = CaseOf (n) (\x -> go (n+1)) > > While we're at it, let's translate two more list functions > > foldL' :: (b -> a -> b) -> b -> ListTo a b > foldL' f b = Case b (\a -> foldL' f $! f b a) > > sumL :: ListTo Int Int > sumL = foldL' (\b a -> a+b) 0 > > > And now we can go for the point of this message: unlike ordinary functions > from lists, we can compose these in lock-step! In particular, the following > applicative instance > > instance Applicative (ListTo a) where > pure b = CaseOf b (const $ pure b) > (CaseOf f fs) <*> (CaseOf x xs) = > CaseOf (f x) (\a -> fs a <*> xs a) > > allows us to write a function > > average :: ListTo Int Double > average = divide <$> sumL <*> lengthL > where > divide a b = fromIntegral a / fromIntegral b > > that runs in constant space! Why does this work? Well, since we can now > inspect case expressions, we can choose to evaluate them in lock-step, > essentially computing sum and length with just one pass over the input > list. Remember that the original definition > > average xs = sum xs / length xs > > has a space leak because the input list xs is being shared. > > > Remarks: > 1. Reified case expressions are, of course, the same thing as Iteratees, > modulo chunking and weird naming. > > 2. My point is topped by scathing irony: if Haskell had a form of *partial > evaluation*, we could write applicative combinators for *ordinary* > functions [a] -> r and express average in constant space. In other > words, partial evaluation would make it unnecessary to reify case > expressions for the purpose of controlling performance / space leaks. > > > Best regards, > Heinrich Apfelmus > > -- > http://apfelmus.nfshost.com > > > ______________________________**_________________ > Haskell-Cafe mailing list > Haskell-Cafe@haskell.org > http://www.haskell.org/**mailman/listinfo/haskell-cafe<http://www.haskell.org/mailman/listinfo/haskell-cafe> > -- Eugene Kirpichov Principal Engineer, Mirantis Inc. http://www.mirantis.com/ Editor, http://fprog.ru/
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe