ok,got it, i'll stare at this when i'm more awake. :)
On Fri, Feb 21, 2014 at 12:40 AM, Gabriel Gonzalez <[email protected]>wrote: > This is not related to the quadratic issue from the naive free monad > bind. Switching to the codensity transformation would not fix the problem > at all. > > The tl;dr way to convince yourself of this is to just compare the > performance of the following two parsers: > > example1 = forever $ zoom id draw > > example2 = forever draw > > The former will trigger the quadratic blowup whereas the second one will > not. The issue is `zoom`, not `forever`. > > In more detail, the issue is that `zoom` currently works like this: > > * Decode all input elements (lazily) > * Run the parser on that > * Encode all input elements (lazily) > > So let's say that your input has 1000 elements and your parser only > consumes 2 elements. After it's done parsing those two elements, the > remaining 998 elements will all still be decoded and reencoded. This > encoding/reencoding is lazy, but it's not free and it triggers when you > demand each element, adding an O(1) overhead per element. > > Now, imagine that you call `zoom` N times. Now every subsequent element > will be encoded and reencoded N times. This means that if you call `zoom` > N times the total running time is O(N^2) (i.e. N * (N + 1) / 2). > > > On 02/21/2014 12:31 PM, Carter Schonwald wrote: > > Gabriel, isnt edward k's Machines.Process an example of another strategy > that evades that quadratic blowup? > > http://hackage.haskell.org/package/machines-0.2.3.1/docs/Data-Machine-Process.html > (or is this a totally unrelated problem?) > > > On Thu, Feb 20, 2014 at 11:59 PM, Gabriel Gonzalez > <[email protected]>wrote: > >> So the one part of Michael's post that was correct was that each use of >> `zoom` adds a constant O(1) overhead to every subsequent command. The >> worst case scenario is a quadratic blowup for code like this: >> >> forever $ do >> a <- zoom lens parser >> doSomethingWith a >> >> Every iteration of `forever` will accumulate an additional O(1) overhead, >> so by the Nth iteration, you get O(N) overhead per iteration, and the total >> cost of N iterations is O(N^2). This is what was happening in your code >> since your code is basically of the form: >> >> replicateM_ n $ do >> a <- zoom lens parser >> doSomethingWith a >> >> ... so that triggers O(N^2) space and time complexity, where `n` in your >> example was `10000`. >> >> I'm working on a solution to fix this and I will summarize it in broad >> strokes here. This solution is backwards-compatible with the current API >> and in most use cases will compile to code that is more efficient (both in >> time complexity and constant factors). This would solve the problem that >> Michael described >> >> There would be a new `Pipes.Parse.Fast` internal module (for implementers >> only) which will have two new types. One of them will be a more efficient >> analog of `Lens'`. Let's call it `LensFast'` for lack of a better name. >> >> I will omit the implementation of `LensFast'` since I'm still playing >> with different ideas, but I can describe its interface. First, it would >> form a category, like `Lens'`es: >> >> (.) :: LensFast' a b -> LensFast' b c -> LensFast' a c >> >> id :: LensFast' a a >> >> ... and there would be a functor from `LensFast'` to `Lens'`: >> >> optimal :: LensFast' a b -> Lens' a b >> >> ... that obeys these functor laws: >> >> optimal (f . g) = optimal f . optimal g >> >> optimal id = id >> >> There would also be a more efficient parser type, which I will >> temporarily call `ParserFast`. This representation I've already nailed >> down and it would be equivalent to a free monad transformer over the >> `draw/unDraw` primitive operations, with an optional bind in the base monad >> for efficiency: >> >> data ParserFast a m r >> = M (m (ParserFast a m r)) >> | Draw (Maybe a -> ParserFast m a) >> | UnDraw a (ParserFast m a) >> >> instance Monad m => Monad (ParserFast a m) where ... >> >> instance MonadTrans (ParserFast a) where ... >> >> This would come with an `interpret` function that would be a monad >> morphism from `ParserFast` to `Parser`: >> >> interpret :: Monad m => ParserFast a m r -> Parser a m r >> >> This would obey the following monad morphism laws: >> >> interpret$ do x <- m = do x <- interpret m >> f x interpret (f x) >> >> >> interpret (return x) = return x >> >> ... and monad morphism laws are also functor laws if you view them from >> this perspective: >> >> interpret . (f >=> g) = interpret . f >=> interpret . g >> >> interpret . return = return >> >> The last piece of the puzzle is a `zoom_` function designed to work with >> optimized `LensFast'`s and `ParserFast`s. It would have this type: >> >> zoom_ :: Monad m => LensFast a b -> ParserFast b m r -> ParserFast a >> m r >> >> ... and it would obey this equation: >> >> zoom (optimal lensfast) (interpret parserfast) = interpret (zoom_ >> lensfast parserfast) >> >> That equation would be a rewrite rule that turns the inefficient `zoom` >> into the much more efficient `zoom_` that doesn't do unnecessary >> round-trips or trigger quadratic time complexity. >> >> The way you put it all together is that the default user-facing API would >> define all lenses and parsers in terms of `optimal` and `interpret`. i.e.: >> >> exampleParser = interpret internalParserFast >> >> exampleLens = optimal internalLensFast >> >> Then if the user writes something like: >> >> zoom exampleParser exampleLens >> >> ... then `exampleParser` and `exampleLens` would be written to force >> inlining early, exposing the `interpret` and `optimal` functions: >> >> zoom (optimal internalLensFast) (interpret internalParserFast) >> >> ... then that would trigger the rewrite rule: >> >> interpret (zoom_ internalLensFast internalParserFast) >> >> ... and now the code would be efficient and avoid the problems Michael >> described. >> >> You can also optimize some other cases automatically, too. For example, >> if the user connects two lenses, which are each internally optimized: >> >> lens1 = optimal internalLens1 >> lens2 = optimal internalLens2 >> >> parser = interpret internalParser >> >> example = zoom (lens1 . lens2) parser >> >> ... and that is inlined: >> >> example = zoom (optimal internalLens1 . optimal internalLens2) >> (interpret internalParser) >> >> Then you can provide a rewrite rule using the first functor law for >> `optimal` to rewrite the two lenses to a single optimized lens: >> >> example = zoom (optimal (internalLens1 . internalLens2)) (interpret >> internalParser) >> >> Then that is now in the right form for the `zoom_` rewrite rule and >> everything just works! >> >> example = interpret (zoom_ (internalLens1 . internalLens21) >> internalParser) >> >> So far so good! We can expect those rewrite rules to fire reliably and >> generate efficient code. >> >> The problematic part is where you `zoom` over multiple parsers like this: >> >> lens = optimal internalLens >> parser1 = interpret internalParser1 >> parser2 = interpret internalParser2 >> >> example = zoom lens $ do >> x <- parser1 >> y <- parser2 >> return (x, y) >> >> If you inline those you get: >> >> example = zoom (optimal internalLens) $ do >> x <- interpret internalParser1 >> y <- interpret internalParser2 >> return (x, y) >> >> In theory you could use the monad morphism laws for `interpret` to factor >> out the `interpret`: >> >> example = zoom (optimal internalLens $ interpret $ do >> x <- internalParser1 >> y <- internalParser2 >> return (x, y) >> >> Now that's in a form that can be optimized by the rewrite rule for >> `zoom_`: >> >> example = interpret $ zoom_ internalLens $ do >> x <- internalParser1 >> y <- internalParser2 >> return (x, y) >> >> **HOWEVER** in practice I don't yet know a way to get the monad morphism >> rewrite rules to factor out the `interpret` in all cases, which is >> necessary to trigger the `zoom_` rewrite rule. This is where Haskell's >> rewrite rule system falls flat on its face. If anybody wants a longer >> explanation of why, I can elaborate more on this. >> >> In practice, I'm willing to put up with this inefficiency and just >> explain to users how to work around this latter issue by either: >> >> a) distributing the `zoom` over the parsers, like this: >> >> example = do >> x <- zoom lens parser1 >> y <- zoom lens parser2 >> return (x, y) >> >> Note that this distribution only works if the lens does not violate the >> monad morphism laws for `zoom`. Fortunately, the lenses that trigger this >> performance issue are precisely the same lenses that satisfy the monad >> morphism laws for `zoom` (example: the `decodeUtf8` lens). Vice versa, the >> lenses that violate the `zoom` monad morphism laws are the ones that give >> excellent performance in conjunction with `zoom` (i.e. delimiting lenses >> like `splitAt`). >> >> b) Expose the internal lenses and parsers so that users can use them >> directly rather than relying on rewrite rules. >> >> I prefer approach (a) because then we can keep the internals hidden. >> >> Note that this solution requires that `pipes-parse` provides its own >> `zoom` function (which is really easy), or alternatively defines the >> rewrite rule in terms of the `zoom` function from the `lens-family` >> package. A rewrite rule in terms of the `zoom` function from `lens` is out >> of the question since I don't want a `lens` dependency. >> >> Does that answer your question? >> >> >> >> On 02/21/2014 03:37 AM, Michael Thompson wrote: >> >>> I was looking at Michael Snoyman's post on pipes parsers, >>> >>> http://www.yesodweb.com/blog/2014/02/ideas-pipes-parse >>> >>> and working through some of the reddit controversy >>> >>> >>> http://www.reddit.com/r/haskell/comments/1xmmtn/some_ideas_for_pipesparse/ >>> >>> trying to think if it might matter to `pipes-text` >>> >>> I worked from the attempted pipes variant of his conduit >>> parser for a special archival format for a 'File' type. >>> Here is the result I will discuss below >>> >>> https://gist.github.com/michaelt/9122379 >>> >>> The first form of the parser (`fileParser0`) was >>> not so bad when I used it on simple files. >>> Then I increased the length of the file to be parsed >>> into a succession of `Files` resulting in a 1M file. >>> (You can write a suitable file by uncommenting >>> the first line in `main`) >>> >>> On 1M of data, the obvious ways of >>> formulating the parser in the pipes-parse style >>> seem to go wildly wrong. The extravagance of >>> my computer's response -- "swap space full", etc -- >>> may have something to do with os x vs. ghc-7.6 >>> but they are obviously bad programs. I couldn't >>> get them to complete. >>> >>> That is the result if I use either of `fileParser0` or >>> `fileParser1`. >>> >>> If however I use `parse2`, where I limit >>> each use of `zoom utf8` by a prior >>> application of `PB.splitAt` (or break) >>> the 1M file is done in a half-second and >>> uses no significant memory. >>> >>> Is there a mistake I am making, a mistake >>> in the implementation of something, >>> or some simple rule that can be stated >>> to avoid such things? >>> >>> Also I wonder if the result of using e.g `fileParser1` is as >>> pathological on other os's. >>> >>> yours Michael >>> >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "Haskell Pipes" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To post to this group, send email to [email protected]. >>> >> >> -- >> You received this message because you are subscribed to the Google Groups >> "Haskell Pipes" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To post to this group, send email to [email protected]. >> > > -- > You received this message because you are subscribed to the Google Groups > "Haskell Pipes" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > > > -- > You received this message because you are subscribed to the Google Groups > "Haskell Pipes" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > -- You received this message because you are subscribed to the Google Groups "Haskell Pipes" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected].
