Re: [GHC] #1544: Derived Read instances for recursive datatypes with infix constructors are too inefficient

GHC Mon, 20 Aug 2007 14:30:26 -0700

#1544: Derived Read instances for recursive datatypes with infix constructors 
are
too inefficient
-------------------------------------+--------------------------------------
    Reporter:  [EMAIL PROTECTED]  |        Owner:         
        Type:  bug                   |       Status:  new    
    Priority:  normal                |    Milestone:  6.8    
   Component:  Compiler              |      Version:  6.6.1  
    Severity:  normal                |   Resolution:         
    Keywords:                        |   Difficulty:  Unknown
          Os:  Unknown               |     Testcase:         
Architecture:  Unknown               |  
-------------------------------------+--------------------------------------
Old description:


> Consider this definition:
>
> data Exp = C | Exp :+: Exp | Exp :-: Exp deriving ( Read, Show )
>
> Now, try something like:
>
> > read "((((((((((C))))))))))" :: T
>
> Even this simple expression may take several seconds to parse. It gets
> worse if you keep adding parenthesis. And even worse if you add more
> infix constructors....

New description:

 Consider this definition:
 {{{
 data Exp = C | Exp :+: Exp | Exp :-: Exp deriving ( Read, Show )
 }}}
 Now, try something like:
 {{{
 > read "((((((((((C))))))))))" :: T
 }}}
 Even this simple expression may take several seconds to parse. It gets
 worse if you keep adding parenthesis. And even worse if you add more infix
 constructors....

Comment (by simonpj):

 Here is the story as far as I can work it out.  I don't have a solution
 yet, but this records my diagnosis.  I've asked Koen Claessen (who
 originated the clever parsing technology) for help.  Here is his paper
 which describes it: http://www.cs.chalmers.se/~koen/pubs/entry-jfp04-
 parser.html

 For a data type like
 {{{
 data Exp = C | Exp :+: Exp
 }}}
 we get a parser like this:
 {{{
    readPrec = parens
                           ((do Ident "C" <- lexP
                                return C)
                          +++
                            (prec 9 $
                                (do a <- step readPrec
                                    Symbol ":+:" <- lexP
                                    b <- step readPrec
                                    return (a :+: b ))
                         ))
 }}}
 The trouble comes because when we see a paren we don't know whether it
 encloses the whole expression or just part
 {{{
         (C :+: C :+: C)
 }}}
 or
 {{{
         (C :+: C) :+: C
 }}}
 So the `a<-readPrec` line expands to a new unfolding of `readPrec`.  We
 get two parsers working in parallel, one expecting the closing paren at
 the end,
 {{{
         (t1)
 }}}
 where `t1` is a term, and one expecting a term of form
 {{{
         (t1) :+: t2
 }}}
 (There are two more parsers, each expecting a plain C, but I'll ignore
 those.  Also in case you are wondering, the precedence mechanism cuts off
 the left-recursion problem.)

 So far so good.  But when we consume a paren, each of these two parsers
 generate two new parsers in exactly the same way. So after consuming one
 paren we have FOUR parsers expecting terms of form
 {{{
         ((t1))
         ((t1) :+: t2)
         ((t1)) :+: t2
         ((t1) :+: t2) :+: t3
 }}}
 This is clearly unacceptable, because after `n` parens we get `2^n`
 parsers.  At least that is my theory, and it certainly would explain what
 is going on.


 I have now forgotten why I originally adopted Koen's parallel parser idea!
 I'm sure it's a good one.  But this does seem like a serious problem.
 Remember that we must generate a parser for data type `T` without looking
 at the parsers for any other data types.

 Furthermore, although I said I'd ignore it, we do get two (or more)
 parsers each independently parsing the same lexeme.
 {{{
         (do { Ident "C" <- lexP; ...} +++ do { Ident "C" <- lexP; ...})
 }}}
 They get expanded out, and then dynamically recombined by the parser
 machinery, but it seems like a lot of wasted work is done.    But I
 don’t think that's what's giving the problem here.

 I attach a complete program `ParseInfix.hs` that demonstrates the problem,
 with a hand-written read instance so that you don't have to guess what
 `deriving` is doing.  Just compile and run.

 Simon

-- 
Ticket URL: <http://hackage.haskell.org/trac/ghc/ticket/1544>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler

_______________________________________________
Glasgow-haskell-bugs mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-bugs

Re: [GHC] #1544: Derived Read instances for recursive datatypes with infix constructors are too inefficient

Reply via email to