Re: [haskell-pipes] `zoom utf8` / pipes-parse trouble

Carter Schonwald Thu, 20 Feb 2014 21:32:21 -0800

Gabriel, isnt edward k's Machines.Process an example of another strategy
that evades that quadratic blowup?
http://hackage.haskell.org/package/machines-0.2.3.1/docs/Data-Machine-Process.html
(or is this a totally unrelated problem?)



On Thu, Feb 20, 2014 at 11:59 PM, Gabriel Gonzalez <[email protected]>wrote:

> So the one part of Michael's post that was correct was that each use of
> `zoom` adds a constant O(1) overhead to every subsequent command.  The
> worst case scenario is a quadratic blowup for code like this:
>
>     forever $ do
>         a <- zoom lens parser
>         doSomethingWith a
>
> Every iteration of `forever` will accumulate an additional O(1) overhead,
> so by the Nth iteration, you get O(N) overhead per iteration, and the total
> cost of N iterations is O(N^2).  This is what was happening in your code
> since your code is basically of the form:
>
>     replicateM_ n $ do
>         a <- zoom lens parser
>         doSomethingWith a
>
> ... so that triggers O(N^2) space and time complexity, where `n` in your
> example was `10000`.
>
> I'm working on a solution to fix this and I will summarize it in broad
> strokes here.  This solution is backwards-compatible with the current API
> and in most use cases will compile to code that is more efficient (both in
> time complexity and constant factors).  This would solve the problem that
> Michael described
>
> There would be a new `Pipes.Parse.Fast` internal module (for implementers
> only) which will have two new types.  One of them will be a more efficient
> analog of `Lens'`.  Let's call it `LensFast'` for lack of a better name.
>
> I will omit the implementation of `LensFast'` since I'm still playing with
> different ideas, but I can describe its interface. First, it would form a
> category, like `Lens'`es:
>
>     (.) :: LensFast' a b -> LensFast' b c -> LensFast' a c
>
>     id :: LensFast' a a
>
> ... and there would be a functor from `LensFast'` to `Lens'`:
>
>     optimal :: LensFast' a b -> Lens' a b
>
> ... that obeys these functor laws:
>
>     optimal (f . g) = optimal f . optimal g
>
>     optimal id = id
>
> There would also be a more efficient parser type, which I will temporarily
> call `ParserFast`.  This representation I've already nailed down and it
> would be equivalent to a free monad transformer over the `draw/unDraw`
> primitive operations, with an optional bind in the base monad for
> efficiency:
>
>     data ParserFast a m r
>         = M (m (ParserFast a m r))
>         | Draw (Maybe a -> ParserFast m a)
>         | UnDraw a (ParserFast m a)
>
>     instance Monad m => Monad (ParserFast a m) where ...
>
>     instance MonadTrans (ParserFast a) where ...
>
> This would come with an `interpret` function that would be a monad
> morphism from `ParserFast` to `Parser`:
>
>     interpret :: Monad m => ParserFast a m r -> Parser a m r
>
> This would obey the following monad morphism laws:
>
>     interpret$ do x <- m  =  do x <- interpret m
>                    f x           interpret (f x)
>
>
>     interpret (return x) = return x
>
> ...  and monad morphism laws are also functor laws if you view them from
> this perspective:
>
>     interpret . (f >=> g) = interpret . f >=> interpret . g
>
>     interpret . return = return
>
> The last piece of the puzzle is a `zoom_` function designed to work with
> optimized `LensFast'`s and `ParserFast`s.  It would have this type:
>
>     zoom_ :: Monad m => LensFast a b -> ParserFast b m r -> ParserFast a m
> r
>
> ... and it would obey this equation:
>
>     zoom (optimal lensfast) (interpret parserfast) = interpret (zoom_
> lensfast parserfast)
>
> That equation would be a rewrite rule that turns the inefficient `zoom`
> into the much more efficient `zoom_` that doesn't do unnecessary
> round-trips or trigger quadratic time complexity.
>
> The way you put it all together is that the default user-facing API would
> define all lenses and parsers in terms of `optimal` and `interpret`.  i.e.:
>
>     exampleParser = interpret internalParserFast
>
>     exampleLens = optimal internalLensFast
>
> Then if the user writes something like:
>
>     zoom exampleParser exampleLens
>
> ... then `exampleParser` and `exampleLens` would be written to force
> inlining early, exposing the `interpret` and `optimal` functions:
>
>     zoom (optimal internalLensFast) (interpret internalParserFast)
>
> ... then that would trigger the rewrite rule:
>
>     interpret (zoom_ internalLensFast internalParserFast)
>
> ... and now the code would be efficient and avoid the problems Michael
> described.
>
> You can also optimize some other cases automatically, too.  For example,
> if the user connects two lenses, which are each internally optimized:
>
>     lens1 = optimal internalLens1
>     lens2 = optimal internalLens2
>
>     parser = interpret internalParser
>
>     example = zoom (lens1 . lens2) parser
>
> ... and that is inlined:
>
>     example = zoom (optimal internalLens1 . optimal internalLens2)
> (interpret internalParser)
>
> Then you can provide a rewrite rule using the first functor law for
> `optimal` to rewrite the two lenses to a single optimized lens:
>
>     example = zoom (optimal (internalLens1 . internalLens2)) (interpret
> internalParser)
>
> Then that is now in the right form for the `zoom_` rewrite rule and
> everything just works!
>
>     example = interpret (zoom_ (internalLens1 . internalLens21)
> internalParser)
>
> So far so good!  We can expect those rewrite rules to fire reliably and
> generate efficient code.
>
> The problematic part is where you `zoom` over multiple parsers like this:
>
>     lens = optimal internalLens
>     parser1 = interpret internalParser1
>     parser2 = interpret internalParser2
>
>     example = zoom lens $ do
>         x <- parser1
>         y <- parser2
>         return (x, y)
>
> If you inline those you get:
>
>     example = zoom (optimal internalLens) $ do
>         x <- interpret internalParser1
>         y <- interpret internalParser2
>         return (x, y)
>
> In theory you could use the monad morphism laws for `interpret` to factor
> out the `interpret`:
>
>     example = zoom (optimal internalLens $ interpret $ do
>         x <- internalParser1
>         y <- internalParser2
>         return (x, y)
>
> Now that's in a form that can be optimized by the rewrite rule for `zoom_`:
>
>     example = interpret $ zoom_ internalLens $ do
>         x <- internalParser1
>         y <- internalParser2
>         return (x, y)
>
> **HOWEVER** in practice I don't yet know a way to get the monad morphism
> rewrite rules to factor out the `interpret` in all cases, which is
> necessary to trigger the `zoom_` rewrite rule.  This is where Haskell's
> rewrite rule system falls flat on its face.  If anybody wants a longer
> explanation of why, I can elaborate more on this.
>
> In practice, I'm willing to put up with this inefficiency and just explain
> to users how to work around this latter issue by either:
>
> a) distributing the `zoom` over the parsers, like this:
>
>     example = do
>         x <- zoom lens parser1
>         y <- zoom lens parser2
>         return (x, y)
>
> Note that this distribution only works if the lens does not violate the
> monad morphism laws for `zoom`.  Fortunately, the lenses that trigger this
> performance issue are precisely the same lenses that satisfy the monad
> morphism laws for `zoom` (example: the `decodeUtf8` lens).  Vice versa, the
> lenses that violate the `zoom` monad morphism laws are the ones that give
> excellent performance in conjunction with `zoom` (i.e. delimiting lenses
> like `splitAt`).
>
> b) Expose the internal lenses and parsers so that users can use them
> directly rather than relying on rewrite rules.
>
> I prefer approach (a) because then we can keep the internals hidden.
>
> Note that this solution requires that `pipes-parse` provides its own
> `zoom` function (which is really easy), or alternatively defines the
> rewrite rule in terms of the `zoom` function from the `lens-family`
> package.  A rewrite rule in terms of the `zoom` function from `lens` is out
> of the question since I don't want a `lens` dependency.
>
> Does that answer your question?
>
>
>
> On 02/21/2014 03:37 AM, Michael Thompson wrote:
>
>> I was looking at Michael Snoyman's post on pipes parsers,
>>
>> http://www.yesodweb.com/blog/2014/02/ideas-pipes-parse
>>
>> and working through some of the reddit controversy
>>
>>  http://www.reddit.com/r/haskell/comments/1xmmtn/some_
>> ideas_for_pipesparse/
>>
>> trying to think if it might matter to `pipes-text`
>>
>> I worked from the attempted pipes variant of his conduit
>> parser for a special archival format for a 'File' type.
>> Here is the result I will discuss below
>>
>> https://gist.github.com/michaelt/9122379
>>
>> The first form of the parser (`fileParser0`) was
>> not so bad when I used it on simple files.
>> Then I increased the length of the file to be parsed
>> into a succession of `Files` resulting in a 1M file.
>> (You can write a suitable file by uncommenting
>> the first line in `main`)
>>
>> On 1M of data, the obvious ways of
>> formulating the parser in the pipes-parse style
>> seem to go wildly wrong. The extravagance of
>> my computer's response -- "swap space full", etc --
>> may have something to do with os x vs. ghc-7.6
>> but they are obviously bad programs. I couldn't
>> get them to complete.
>>
>> That is the result if I use either of `fileParser0` or
>> `fileParser1`.
>>
>> If however I use `parse2`, where I limit
>> each use of `zoom utf8` by a prior
>> application of `PB.splitAt` (or break)
>> the 1M file is done in a half-second and
>> uses no significant memory.
>>
>> Is there a mistake I am making, a mistake
>> in the implementation of something,
>> or some simple rule that can be stated
>> to avoid such things?
>>
>> Also I wonder if the result of using e.g `fileParser1` is as pathological
>> on other os's.
>>
>> yours Michael
>>
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Haskell Pipes" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To post to this group, send email to [email protected].
>>
>
> --
> You received this message because you are subscribed to the Google Groups
> "Haskell Pipes" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
>

-- 
You received this message because you are subscribed to the Google Groups 
"Haskell Pipes" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].

Re: [haskell-pipes] `zoom utf8` / pipes-parse trouble

Reply via email to