This is not related to the quadratic issue from the naive free monad
bind. Switching to the codensity transformation would not fix the
problem at all.
The tl;dr way to convince yourself of this is to just compare the
performance of the following two parsers:
example1 = forever $ zoom id draw
example2 = forever draw
The former will trigger the quadratic blowup whereas the second one will
not. The issue is `zoom`, not `forever`.
In more detail, the issue is that `zoom` currently works like this:
* Decode all input elements (lazily)
* Run the parser on that
* Encode all input elements (lazily)
So let's say that your input has 1000 elements and your parser only
consumes 2 elements. After it's done parsing those two elements, the
remaining 998 elements will all still be decoded and reencoded. This
encoding/reencoding is lazy, but it's not free and it triggers when you
demand each element, adding an O(1) overhead per element.
Now, imagine that you call `zoom` N times. Now every subsequent element
will be encoded and reencoded N times. This means that if you call
`zoom` N times the total running time is O(N^2) (i.e. N * (N + 1) / 2).
On 02/21/2014 12:31 PM, Carter Schonwald wrote:
Gabriel, isnt edward k's Machines.Process an example of another
strategy that evades that quadratic blowup?
http://hackage.haskell.org/package/machines-0.2.3.1/docs/Data-Machine-Process.html
(or is this a totally unrelated problem?)
On Thu, Feb 20, 2014 at 11:59 PM, Gabriel Gonzalez
<[email protected] <mailto:[email protected]>> wrote:
So the one part of Michael's post that was correct was that each
use of `zoom` adds a constant O(1) overhead to every subsequent
command. The worst case scenario is a quadratic blowup for code
like this:
forever $ do
a <- zoom lens parser
doSomethingWith a
Every iteration of `forever` will accumulate an additional O(1)
overhead, so by the Nth iteration, you get O(N) overhead per
iteration, and the total cost of N iterations is O(N^2). This is
what was happening in your code since your code is basically of
the form:
replicateM_ n $ do
a <- zoom lens parser
doSomethingWith a
... so that triggers O(N^2) space and time complexity, where `n`
in your example was `10000`.
I'm working on a solution to fix this and I will summarize it in
broad strokes here. This solution is backwards-compatible with
the current API and in most use cases will compile to code that is
more efficient (both in time complexity and constant factors).
This would solve the problem that Michael described
There would be a new `Pipes.Parse.Fast` internal module (for
implementers only) which will have two new types. One of them
will be a more efficient analog of `Lens'`. Let's call it
`LensFast'` for lack of a better name.
I will omit the implementation of `LensFast'` since I'm still
playing with different ideas, but I can describe its interface.
First, it would form a category, like `Lens'`es:
(.) :: LensFast' a b -> LensFast' b c -> LensFast' a c
id :: LensFast' a a
... and there would be a functor from `LensFast'` to `Lens'`:
optimal :: LensFast' a b -> Lens' a b
... that obeys these functor laws:
optimal (f . g) = optimal f . optimal g
optimal id = id
There would also be a more efficient parser type, which I will
temporarily call `ParserFast`. This representation I've already
nailed down and it would be equivalent to a free monad transformer
over the `draw/unDraw` primitive operations, with an optional bind
in the base monad for efficiency:
data ParserFast a m r
= M (m (ParserFast a m r))
| Draw (Maybe a -> ParserFast m a)
| UnDraw a (ParserFast m a)
instance Monad m => Monad (ParserFast a m) where ...
instance MonadTrans (ParserFast a) where ...
This would come with an `interpret` function that would be a monad
morphism from `ParserFast` to `Parser`:
interpret :: Monad m => ParserFast a m r -> Parser a m r
This would obey the following monad morphism laws:
interpret$ do x <- m = do x <- interpret m
f x interpret (f x)
interpret (return x) = return x
... and monad morphism laws are also functor laws if you view
them from this perspective:
interpret . (f >=> g) = interpret . f >=> interpret . g
interpret . return = return
The last piece of the puzzle is a `zoom_` function designed to
work with optimized `LensFast'`s and `ParserFast`s. It would have
this type:
zoom_ :: Monad m => LensFast a b -> ParserFast b m r ->
ParserFast a m r
... and it would obey this equation:
zoom (optimal lensfast) (interpret parserfast) = interpret
(zoom_ lensfast parserfast)
That equation would be a rewrite rule that turns the inefficient
`zoom` into the much more efficient `zoom_` that doesn't do
unnecessary round-trips or trigger quadratic time complexity.
The way you put it all together is that the default user-facing
API would define all lenses and parsers in terms of `optimal` and
`interpret`. i.e.:
exampleParser = interpret internalParserFast
exampleLens = optimal internalLensFast
Then if the user writes something like:
zoom exampleParser exampleLens
... then `exampleParser` and `exampleLens` would be written to
force inlining early, exposing the `interpret` and `optimal`
functions:
zoom (optimal internalLensFast) (interpret internalParserFast)
... then that would trigger the rewrite rule:
interpret (zoom_ internalLensFast internalParserFast)
... and now the code would be efficient and avoid the problems
Michael described.
You can also optimize some other cases automatically, too. For
example, if the user connects two lenses, which are each
internally optimized:
lens1 = optimal internalLens1
lens2 = optimal internalLens2
parser = interpret internalParser
example = zoom (lens1 . lens2) parser
... and that is inlined:
example = zoom (optimal internalLens1 . optimal internalLens2)
(interpret internalParser)
Then you can provide a rewrite rule using the first functor law
for `optimal` to rewrite the two lenses to a single optimized lens:
example = zoom (optimal (internalLens1 . internalLens2))
(interpret internalParser)
Then that is now in the right form for the `zoom_` rewrite rule
and everything just works!
example = interpret (zoom_ (internalLens1 . internalLens21)
internalParser)
So far so good! We can expect those rewrite rules to fire
reliably and generate efficient code.
The problematic part is where you `zoom` over multiple parsers
like this:
lens = optimal internalLens
parser1 = interpret internalParser1
parser2 = interpret internalParser2
example = zoom lens $ do
x <- parser1
y <- parser2
return (x, y)
If you inline those you get:
example = zoom (optimal internalLens) $ do
x <- interpret internalParser1
y <- interpret internalParser2
return (x, y)
In theory you could use the monad morphism laws for `interpret` to
factor out the `interpret`:
example = zoom (optimal internalLens $ interpret $ do
x <- internalParser1
y <- internalParser2
return (x, y)
Now that's in a form that can be optimized by the rewrite rule for
`zoom_`:
example = interpret $ zoom_ internalLens $ do
x <- internalParser1
y <- internalParser2
return (x, y)
**HOWEVER** in practice I don't yet know a way to get the monad
morphism rewrite rules to factor out the `interpret` in all cases,
which is necessary to trigger the `zoom_` rewrite rule. This is
where Haskell's rewrite rule system falls flat on its face. If
anybody wants a longer explanation of why, I can elaborate more on
this.
In practice, I'm willing to put up with this inefficiency and just
explain to users how to work around this latter issue by either:
a) distributing the `zoom` over the parsers, like this:
example = do
x <- zoom lens parser1
y <- zoom lens parser2
return (x, y)
Note that this distribution only works if the lens does not
violate the monad morphism laws for `zoom`. Fortunately, the
lenses that trigger this performance issue are precisely the same
lenses that satisfy the monad morphism laws for `zoom` (example:
the `decodeUtf8` lens). Vice versa, the lenses that violate the
`zoom` monad morphism laws are the ones that give excellent
performance in conjunction with `zoom` (i.e. delimiting lenses
like `splitAt`).
b) Expose the internal lenses and parsers so that users can use
them directly rather than relying on rewrite rules.
I prefer approach (a) because then we can keep the internals hidden.
Note that this solution requires that `pipes-parse` provides its
own `zoom` function (which is really easy), or alternatively
defines the rewrite rule in terms of the `zoom` function from the
`lens-family` package. A rewrite rule in terms of the `zoom`
function from `lens` is out of the question since I don't want a
`lens` dependency.
Does that answer your question?
On 02/21/2014 03:37 AM, Michael Thompson wrote:
I was looking at Michael Snoyman's post on pipes parsers,
http://www.yesodweb.com/blog/2014/02/ideas-pipes-parse
and working through some of the reddit controversy
http://www.reddit.com/r/haskell/comments/1xmmtn/some_ideas_for_pipesparse/
trying to think if it might matter to `pipes-text`
I worked from the attempted pipes variant of his conduit
parser for a special archival format for a 'File' type.
Here is the result I will discuss below
https://gist.github.com/michaelt/9122379
The first form of the parser (`fileParser0`) was
not so bad when I used it on simple files.
Then I increased the length of the file to be parsed
into a succession of `Files` resulting in a 1M file.
(You can write a suitable file by uncommenting
the first line in `main`)
On 1M of data, the obvious ways of
formulating the parser in the pipes-parse style
seem to go wildly wrong. The extravagance of
my computer's response -- "swap space full", etc --
may have something to do with os x vs. ghc-7.6
but they are obviously bad programs. I couldn't
get them to complete.
That is the result if I use either of `fileParser0` or
`fileParser1`.
If however I use `parse2`, where I limit
each use of `zoom utf8` by a prior
application of `PB.splitAt` (or break)
the 1M file is done in a half-second and
uses no significant memory.
Is there a mistake I am making, a mistake
in the implementation of something,
or some simple rule that can be stated
to avoid such things?
Also I wonder if the result of using e.g `fileParser1` is as
pathological on other os's.
yours Michael
--
You received this message because you are subscribed to the
Google Groups "Haskell Pipes" group.
To unsubscribe from this group and stop receiving emails from
it, send an email to
[email protected]
<mailto:haskell-pipes%[email protected]>.
To post to this group, send email to
[email protected]
<mailto:[email protected]>.
--
You received this message because you are subscribed to the Google
Groups "Haskell Pipes" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to [email protected]
<mailto:haskell-pipes%[email protected]>.
To post to this group, send email to
[email protected]
<mailto:[email protected]>.
--
You received this message because you are subscribed to the Google
Groups "Haskell Pipes" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to [email protected].
To post to this group, send email to [email protected].
--
You received this message because you are subscribed to the Google Groups "Haskell
Pipes" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].