#1544: Derived Read instances for recursive datatypes with infix constructors 
are
too inefficient
-------------------------------------+--------------------------------------
    Reporter:  [EMAIL PROTECTED]  |        Owner:            
        Type:  bug                   |       Status:  new       
    Priority:  normal                |    Milestone:  6.8 branch
   Component:  Compiler              |      Version:  6.6.1     
    Severity:  normal                |   Resolution:            
    Keywords:                        |   Difficulty:  Unknown   
          Os:  Unknown               |     Testcase:            
Architecture:  Unknown               |  
-------------------------------------+--------------------------------------
Comment (by simonpj):

 Koen writes: Take the following example datatype.
 {{{
   data T = T :+: T | A
 }}}
 Generating a naive recursive parser for this leads to the bad behavior.

 However, for such simple recursive datatypes, we kind of know what
 parsers we should generate. For example, forthe above datatype we
 simply generate a parser that first parses a simple T, and then checks
 if there is an occurrence of :+:, in which case it tries to parse
 another T. Nice and linear.

 However, we can never do this modularly. Here is why. Imagine a module
 with the following datatype:
 {{{
   data T a b = a :+: b  deriving ( Read )
 }}}
 The only thing we can do here is to generate the naive parser.

 Imagine now another module that imports the above module, with a
 datatype declaration that looks like:
 {{{
   data A = (T A A) :*: A
          | C
          deriving ( Read )
 }}}
 Again, the only thing we can do here is to generate a naive parser,
 that uses the parser we generated for T. (We do not even know that the
 above datatype is recursive without looking at the definition of T.)

 Now, the resulting parser will again have the bad behavior: Any
 expression starting with many parentheses will lead to exponential
 behavior, because, for each parenthesis, we do not know if it belongs
 to something of type A or something of type T A A until we have parsed
 the whole expression.

 We ''can'' do something about recursive datatypes where the left
 argument of the operators has exactly the same type as the whole
 datatype. This works for:
 {{{
   data SnocList a = Nil | SnocList a :- a
 }}}
 for example, but not for:
 {{{
   data T a = T [a] :+: T [a] | Leaf a
 }}}
 I guess it is worth implementing this special case anyway, although it
 will not apply to all cases.


 -------------------------------------------------------------------------
 Here is the idea.


 First a simple case. For the following Haskell code:
 {{{
   infix 5 +
   infix 6 *

   data A
     = A + A
     | A * A
     | C B
 }}}
 We generate the following grammar:
 {{{
   A(n) ::= A(6) "+" A(6)       (n<=5)
          | A(7) "*" A(7)       (n<=6)
          | "C" B(0)
          | "(" A(0) ")"
 }}}
 Right now, you simply turn the above into a parser directly. However,
 we are going to "massage" the grammar a bit. First, we explicitly
 split up the grammar into parts, depending on precedence. For this, we
 need to sort and groups the operators according to precedences:
 {{{
   A(0..5) ::= A(6) "+" A(6)
              | A(6)

   A(6) ::= A(7) "*" A(7)
          | A(7)

   A(7..10) ::= "C" B(0)
                 | "(" A(0) ")"
 }}}
 Then, we see that we have overlap in the first two grammar parts. We
 get rid of it by using optional [ ]'s:
 {{{
   A(0..5) ::= A(6) ["+" A(6)]
   A(6) ::= A(7) ["*" A(7)]
   A(7..10) ::= "C" B(0)
                 |   "(" A(0) ")"
 }}}
 This can be turned into efficient Haskell code directly.

 -------------------------------------------------------------------------

 Now a more complicated case. For the following Haskell code:
 {{{
   infix 5 +
   infix 6 *

   data A
     = A + A
     | B * A    -- this operator is not left-recursive
     | C B
 }}}
 We generate the following grammar:
 {{{
   A(n) ::= A(6) "+" A(6)       (n<=5)
          | B(7) "*" A(7)       (n<=6)
          | "C" B(0)
          | "(" A(0) ")"
 }}}
 Right now, you simply turn the above into a parser directly. Again, we
 are going to "massage" the grammar. First, we explicitly split up the
 grammar into parts, depending on precedence.
 {{{
   A(0..5) ::= A(6) "+" A(6)
              | A(6)

   A(6) ::= B(7) "*" A(7)
          | A(7)

   A(7..10) ::= "C" B(0)
                 | "(" A(0) ")"
 }}}
 Unfortunately, there is no (explicit) overlap in the cases for A(6).
 We know that there probably exists overlap (both grammars will accept
 any number of parentheses), but this is not clear, since we do not
 have B's grammar.

 We can get rid of some overlap by using optional [ ]'s:
 {{{
   A(0..5) ::= A(6) ["+" A(6)]
   A(6) ::= B(7) "*" A(7)
            | A(7)
   A(7..10) ::= "C" B(0)
                 |   "(" A(0) ")"
 }}}
 This can be turned into Haskell code directly:
 {{{
 readP n
   | n <= 5 =
       do x <- readP 6
          return x +++ do "+" <- lex
                          y <- readP 6
                          return (x+y)
   | n <= 6 =
       do x <- readP 7 -- :: B
          "*" <- lex
          y <- readP 7
          return (x*y)
   +++ do readP 7 -- :: A
   | otherwise =
       do "C" <- lex
          x <- readP 0
          return (C x)
   +++ do "(" <- lex
          x <- readP 0
          ")" <- lex
          return x
 }}}
 However, this code will
 be inefficient when parsing A(6)'s that start with lots of
 parentheses.

-- 
Ticket URL: <http://hackage.haskell.org/trac/ghc/ticket/1544>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
_______________________________________________
Glasgow-haskell-bugs mailing list
Glasgow-haskell-bugs@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-bugs

Reply via email to