Gabriel, isnt edward k's Machines.Process an example of another strategy that evades that quadratic blowup? http://hackage.haskell.org/package/machines-0.2.3.1/docs/Data-Machine-Process.html (or is this a totally unrelated problem?)
On Thu, Feb 20, 2014 at 11:59 PM, Gabriel Gonzalez <[email protected]>wrote: > So the one part of Michael's post that was correct was that each use of > `zoom` adds a constant O(1) overhead to every subsequent command. The > worst case scenario is a quadratic blowup for code like this: > > forever $ do > a <- zoom lens parser > doSomethingWith a > > Every iteration of `forever` will accumulate an additional O(1) overhead, > so by the Nth iteration, you get O(N) overhead per iteration, and the total > cost of N iterations is O(N^2). This is what was happening in your code > since your code is basically of the form: > > replicateM_ n $ do > a <- zoom lens parser > doSomethingWith a > > ... so that triggers O(N^2) space and time complexity, where `n` in your > example was `10000`. > > I'm working on a solution to fix this and I will summarize it in broad > strokes here. This solution is backwards-compatible with the current API > and in most use cases will compile to code that is more efficient (both in > time complexity and constant factors). This would solve the problem that > Michael described > > There would be a new `Pipes.Parse.Fast` internal module (for implementers > only) which will have two new types. One of them will be a more efficient > analog of `Lens'`. Let's call it `LensFast'` for lack of a better name. > > I will omit the implementation of `LensFast'` since I'm still playing with > different ideas, but I can describe its interface. First, it would form a > category, like `Lens'`es: > > (.) :: LensFast' a b -> LensFast' b c -> LensFast' a c > > id :: LensFast' a a > > ... and there would be a functor from `LensFast'` to `Lens'`: > > optimal :: LensFast' a b -> Lens' a b > > ... that obeys these functor laws: > > optimal (f . g) = optimal f . optimal g > > optimal id = id > > There would also be a more efficient parser type, which I will temporarily > call `ParserFast`. This representation I've already nailed down and it > would be equivalent to a free monad transformer over the `draw/unDraw` > primitive operations, with an optional bind in the base monad for > efficiency: > > data ParserFast a m r > = M (m (ParserFast a m r)) > | Draw (Maybe a -> ParserFast m a) > | UnDraw a (ParserFast m a) > > instance Monad m => Monad (ParserFast a m) where ... > > instance MonadTrans (ParserFast a) where ... > > This would come with an `interpret` function that would be a monad > morphism from `ParserFast` to `Parser`: > > interpret :: Monad m => ParserFast a m r -> Parser a m r > > This would obey the following monad morphism laws: > > interpret$ do x <- m = do x <- interpret m > f x interpret (f x) > > > interpret (return x) = return x > > ... and monad morphism laws are also functor laws if you view them from > this perspective: > > interpret . (f >=> g) = interpret . f >=> interpret . g > > interpret . return = return > > The last piece of the puzzle is a `zoom_` function designed to work with > optimized `LensFast'`s and `ParserFast`s. It would have this type: > > zoom_ :: Monad m => LensFast a b -> ParserFast b m r -> ParserFast a m > r > > ... and it would obey this equation: > > zoom (optimal lensfast) (interpret parserfast) = interpret (zoom_ > lensfast parserfast) > > That equation would be a rewrite rule that turns the inefficient `zoom` > into the much more efficient `zoom_` that doesn't do unnecessary > round-trips or trigger quadratic time complexity. > > The way you put it all together is that the default user-facing API would > define all lenses and parsers in terms of `optimal` and `interpret`. i.e.: > > exampleParser = interpret internalParserFast > > exampleLens = optimal internalLensFast > > Then if the user writes something like: > > zoom exampleParser exampleLens > > ... then `exampleParser` and `exampleLens` would be written to force > inlining early, exposing the `interpret` and `optimal` functions: > > zoom (optimal internalLensFast) (interpret internalParserFast) > > ... then that would trigger the rewrite rule: > > interpret (zoom_ internalLensFast internalParserFast) > > ... and now the code would be efficient and avoid the problems Michael > described. > > You can also optimize some other cases automatically, too. For example, > if the user connects two lenses, which are each internally optimized: > > lens1 = optimal internalLens1 > lens2 = optimal internalLens2 > > parser = interpret internalParser > > example = zoom (lens1 . lens2) parser > > ... and that is inlined: > > example = zoom (optimal internalLens1 . optimal internalLens2) > (interpret internalParser) > > Then you can provide a rewrite rule using the first functor law for > `optimal` to rewrite the two lenses to a single optimized lens: > > example = zoom (optimal (internalLens1 . internalLens2)) (interpret > internalParser) > > Then that is now in the right form for the `zoom_` rewrite rule and > everything just works! > > example = interpret (zoom_ (internalLens1 . internalLens21) > internalParser) > > So far so good! We can expect those rewrite rules to fire reliably and > generate efficient code. > > The problematic part is where you `zoom` over multiple parsers like this: > > lens = optimal internalLens > parser1 = interpret internalParser1 > parser2 = interpret internalParser2 > > example = zoom lens $ do > x <- parser1 > y <- parser2 > return (x, y) > > If you inline those you get: > > example = zoom (optimal internalLens) $ do > x <- interpret internalParser1 > y <- interpret internalParser2 > return (x, y) > > In theory you could use the monad morphism laws for `interpret` to factor > out the `interpret`: > > example = zoom (optimal internalLens $ interpret $ do > x <- internalParser1 > y <- internalParser2 > return (x, y) > > Now that's in a form that can be optimized by the rewrite rule for `zoom_`: > > example = interpret $ zoom_ internalLens $ do > x <- internalParser1 > y <- internalParser2 > return (x, y) > > **HOWEVER** in practice I don't yet know a way to get the monad morphism > rewrite rules to factor out the `interpret` in all cases, which is > necessary to trigger the `zoom_` rewrite rule. This is where Haskell's > rewrite rule system falls flat on its face. If anybody wants a longer > explanation of why, I can elaborate more on this. > > In practice, I'm willing to put up with this inefficiency and just explain > to users how to work around this latter issue by either: > > a) distributing the `zoom` over the parsers, like this: > > example = do > x <- zoom lens parser1 > y <- zoom lens parser2 > return (x, y) > > Note that this distribution only works if the lens does not violate the > monad morphism laws for `zoom`. Fortunately, the lenses that trigger this > performance issue are precisely the same lenses that satisfy the monad > morphism laws for `zoom` (example: the `decodeUtf8` lens). Vice versa, the > lenses that violate the `zoom` monad morphism laws are the ones that give > excellent performance in conjunction with `zoom` (i.e. delimiting lenses > like `splitAt`). > > b) Expose the internal lenses and parsers so that users can use them > directly rather than relying on rewrite rules. > > I prefer approach (a) because then we can keep the internals hidden. > > Note that this solution requires that `pipes-parse` provides its own > `zoom` function (which is really easy), or alternatively defines the > rewrite rule in terms of the `zoom` function from the `lens-family` > package. A rewrite rule in terms of the `zoom` function from `lens` is out > of the question since I don't want a `lens` dependency. > > Does that answer your question? > > > > On 02/21/2014 03:37 AM, Michael Thompson wrote: > >> I was looking at Michael Snoyman's post on pipes parsers, >> >> http://www.yesodweb.com/blog/2014/02/ideas-pipes-parse >> >> and working through some of the reddit controversy >> >> http://www.reddit.com/r/haskell/comments/1xmmtn/some_ >> ideas_for_pipesparse/ >> >> trying to think if it might matter to `pipes-text` >> >> I worked from the attempted pipes variant of his conduit >> parser for a special archival format for a 'File' type. >> Here is the result I will discuss below >> >> https://gist.github.com/michaelt/9122379 >> >> The first form of the parser (`fileParser0`) was >> not so bad when I used it on simple files. >> Then I increased the length of the file to be parsed >> into a succession of `Files` resulting in a 1M file. >> (You can write a suitable file by uncommenting >> the first line in `main`) >> >> On 1M of data, the obvious ways of >> formulating the parser in the pipes-parse style >> seem to go wildly wrong. The extravagance of >> my computer's response -- "swap space full", etc -- >> may have something to do with os x vs. ghc-7.6 >> but they are obviously bad programs. I couldn't >> get them to complete. >> >> That is the result if I use either of `fileParser0` or >> `fileParser1`. >> >> If however I use `parse2`, where I limit >> each use of `zoom utf8` by a prior >> application of `PB.splitAt` (or break) >> the 1M file is done in a half-second and >> uses no significant memory. >> >> Is there a mistake I am making, a mistake >> in the implementation of something, >> or some simple rule that can be stated >> to avoid such things? >> >> Also I wonder if the result of using e.g `fileParser1` is as pathological >> on other os's. >> >> yours Michael >> >> >> -- >> You received this message because you are subscribed to the Google Groups >> "Haskell Pipes" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To post to this group, send email to [email protected]. >> > > -- > You received this message because you are subscribed to the Google Groups > "Haskell Pipes" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > -- You received this message because you are subscribed to the Google Groups "Haskell Pipes" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected].
