#1544: Derived Read instances for recursive datatypes with infix constructors
are
too inefficient
-------------------------------------+--------------------------------------
Reporter: [EMAIL PROTECTED] | Owner:
Type: bug | Status: new
Priority: normal | Milestone: 6.8
Component: Compiler | Version: 6.6.1
Severity: normal | Resolution:
Keywords: | Difficulty: Unknown
Os: Unknown | Testcase:
Architecture: Unknown |
-------------------------------------+--------------------------------------
Old description:
> Consider this definition:
>
> data Exp = C | Exp :+: Exp | Exp :-: Exp deriving ( Read, Show )
>
> Now, try something like:
>
> > read "((((((((((C))))))))))" :: T
>
> Even this simple expression may take several seconds to parse. It gets
> worse if you keep adding parenthesis. And even worse if you add more
> infix constructors....
New description:
Consider this definition:
{{{
data Exp = C | Exp :+: Exp | Exp :-: Exp deriving ( Read, Show )
}}}
Now, try something like:
{{{
> read "((((((((((C))))))))))" :: T
}}}
Even this simple expression may take several seconds to parse. It gets
worse if you keep adding parenthesis. And even worse if you add more infix
constructors....
Comment (by simonpj):
Here is the story as far as I can work it out. I don't have a solution
yet, but this records my diagnosis. I've asked Koen Claessen (who
originated the clever parsing technology) for help. Here is his paper
which describes it: http://www.cs.chalmers.se/~koen/pubs/entry-jfp04-
parser.html
For a data type like
{{{
data Exp = C | Exp :+: Exp
}}}
we get a parser like this:
{{{
readPrec = parens
((do Ident "C" <- lexP
return C)
+++
(prec 9 $
(do a <- step readPrec
Symbol ":+:" <- lexP
b <- step readPrec
return (a :+: b ))
))
}}}
The trouble comes because when we see a paren we don't know whether it
encloses the whole expression or just part
{{{
(C :+: C :+: C)
}}}
or
{{{
(C :+: C) :+: C
}}}
So the `a<-readPrec` line expands to a new unfolding of `readPrec`. We
get two parsers working in parallel, one expecting the closing paren at
the end,
{{{
(t1)
}}}
where `t1` is a term, and one expecting a term of form
{{{
(t1) :+: t2
}}}
(There are two more parsers, each expecting a plain C, but I'll ignore
those. Also in case you are wondering, the precedence mechanism cuts off
the left-recursion problem.)
So far so good. But when we consume a paren, each of these two parsers
generate two new parsers in exactly the same way. So after consuming one
paren we have FOUR parsers expecting terms of form
{{{
((t1))
((t1) :+: t2)
((t1)) :+: t2
((t1) :+: t2) :+: t3
}}}
This is clearly unacceptable, because after `n` parens we get `2^n`
parsers. At least that is my theory, and it certainly would explain what
is going on.
I have now forgotten why I originally adopted Koen's parallel parser idea!
I'm sure it's a good one. But this does seem like a serious problem.
Remember that we must generate a parser for data type `T` without looking
at the parsers for any other data types.
Furthermore, although I said I'd ignore it, we do get two (or more)
parsers each independently parsing the same lexeme.
{{{
(do { Ident "C" <- lexP; ...} +++ do { Ident "C" <- lexP; ...})
}}}
They get expanded out, and then dynamically recombined by the parser
machinery, but it seems like a lot of wasted work is done. But I
don’t think that's what's giving the problem here.
I attach a complete program `ParseInfix.hs` that demonstrates the problem,
with a hand-written read instance so that you don't have to guess what
`deriving` is doing. Just compile and run.
Simon
--
Ticket URL: <http://hackage.haskell.org/trac/ghc/ticket/1544>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler_______________________________________________
Glasgow-haskell-bugs mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-bugs