Robert Atkey wrote: > A deep embedding of a parsing DSL (really a context-sensitive grammar > DSL) would look something like the following. I think I saw something > like this in the Agda2 code somewhere, but I stumbled across it when I > was trying to work out what "free" applicative functors were. > > [snip code & explanation]
This is extremely cool. I tried to understand in my head how this all works but it just didn't click. It all seemed like magic. Then I went ahead and tried to write a printer for your example grammar and now everything is much clearer. Although I had to fight the type checker quite a bit. This is the generic part: > class Print f where > pr :: f a -> String > instance Print nt => Print (Production nt) where > pr = printProduction > printProduction :: Print nt => Production nt a -> String > printProduction (Stop _) = "" > printProduction (Terminal t (Stop _)) = show t > printProduction (Terminal t p) = show t ++ " " ++ printProduction p > printProduction (NonTerminal nt (Stop _)) = pr nt > printProduction (NonTerminal nt p) = pr nt ++ " " ++ printProduction p > instance Print nt => Print (Rule nt) where > pr (Rule ps) = printPs ps where > printPs [] = "" > printPs [p] = printProduction p > printPs (p:ps) = printProduction p ++ " | " ++ printPs ps > data Any f = forall a. Any (f a) > class Enumerable f where > enumeration :: [Any f] > printRule :: Print nt => (nt a -> Rule nt a) -> nt a -> String > printRule g nt = pr nt ++ " ::= " ++ pr (g nt) > printGrammar :: (Print nt, Enumerable nt) => Grammar nt -> String > printGrammar g = foldr (++) "" (intersperse "\n" rules) where > rules = map printAnyRule enumeration > printAnyRule (Any nt) = printRule g nt We must also provide instances for the concrete types: > instance Enumerable NT where > enumeration = [Any Sum, Any Product, Any Value] > instance Print NT where > pr Value = "Value" > pr Product = "Product" > pr Sum = "Sum" So far so good. This even works... almost ;-) *Main> putStrLn $ printGrammar myGrm Sum ::= Product '+' Sum | Product Product ::= Value '*' Product | Value Value ::= Interrupted. -- had to hit Ctrl-C here When I replace 'posInt' with 'digit' in the rule for Value > myGrm Value = ENum <$> digit > <|> id <$ char '(' <*> nt Sum <* char ')' then the printer terminates just fine: *Main> putStrLn $ printGrammar myGrm Sum ::= Product '+' Sum | Product Product ::= Value '*' Product | Value Value ::= '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' | '(' Sum ')' I found that the problem is the use of function 'some' from Control.Applicative in > posInt :: Rule nt Int > posInt = fix 1 . reverse <$> some digit where > fix n [] = 0 > fix n (d:ds) = d*n + fix (n*10) ds Since 'some' is defined recursively, this creates an infinite production for numbers that you can neither print nor otherwise analyse in finite time. I can see at least two solutions: One is to parameterize everything over the type of terminals, too. A type suitable for the example would be > data T = TNum Int | TPlus | TMult | TOParen | TCParen and leave token recognition to a separate scanner. The second solution (which I followed) is to break the recursion by adding another nonterminal to the NT type: > data NT a where > Sum :: NT Expr > Product :: NT Expr > Value :: NT Expr > Number :: NT [Int] > Digit :: NT Int > instance Enumerable NT where > enumeration = [Any Sum, Any Product, Any Value, Any Number, Any Digit] > instance Print NT where > pr Sum = "Sum" > pr Product = "Product" > pr Value = "Value" > pr Number = "Number" > pr Digit = "Digit" (Adding Digit /and/ Number is not strictly necessary, but it makes for a nicer presentation.) > myGrm :: Grammar NT > myGrm Sum = ESum <$> nt Product <* char '+' <*> nt Sum > <|> id <$> nt Product > > myGrm Product = EProduct <$> nt Value <* char '*' <*> nt Product > <|> id <$> nt Value > > myGrm Value = (ENum . toNat) <$> nt Number > <|> id <$ char '(' <*> nt Sum <* char ')' > > myGrm Number = extend <$> nt Digit <*> optional (nt Number) > > myGrm Digit = digit > extend d Nothing = [d] > extend d (Just ds) = d:ds > toNat :: [Int] -> Int > toNat = fix 1 . reverse where > fix n [] = 0 > fix n (d:ds) = d*n + fix (n*10) ds With this I get *Main> putStrLn $ printGrammar myGrm Sum ::= Product '+' Sum | Product Product ::= Value '*' Product | Value Value ::= Number | '(' Sum ')' Number ::= Digit Number | Digit Digit ::= '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' Morale: Be careful with recursive functions when constructing a data representation (e.g. for a deep DSL). You might get an infinite representation which isn't what you want in this case. Oh, and another point: there should be a distinguished "start" nonterminal, otherwise this is not really a grammar. This suggests something like > type Grammar nt a = (nt a, forall b. nt b -> Rule nt b) Next thing I'll try is to transform such a grammar into an actual parser... Cheers Ben _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe