Conor McBride wrote: > Neither Oleg nor Bruno translated my code; they threw away my > structurally recursive on-the-fly automaton and wrote combinator parsers > instead. That's why there's no existential, etc. The suggestion that > removing the GADT simplifies the code would be better substantiated if > like was compared with like.
The following code specifically tries to be as close to the original Conor McBride's code as possible. I only removed local type annotations (turn out unnecessary) and replaced `ap' with `liftM' as a matter of personal preference. > Where we came in, if you recall, was the use of a datatype to define > regular expressions. I used a GADT to equip this type with some useful > semantic information (namely a type of parsed objects), and I wrote a > program (divide) which exploited this representation to compute the > regexp the tail of the input must match from the regexp the whole input > must match. This may be a bit of a weird way to write a parser, but it's > a useful sort of thing to be able to do... That particular feature is fully preserved. Data types are used to define regular expression: > p = (Star (Mult (Star (Check (== 'a'))) (Star (Check (== 'b'))))) > > testp = parse p "abaabaaabbbb" Moreover, *RX> :t p p :: Star (Mult (Star (Check Char)) (Star (Check Char))) the regular expression is apparent in the type of the `parser' (or its representation, to be precise). One can easily define data types Digit, Alphanumeric, WhiteSpace, etc. and so type will look more informative. Here's one step to regular types... Instead of GADT, a type class is used to associate semantic information with the data type (with labels). Typing is more explicit now. > And that's the point: not only does the GADT have all the disadvantages > of a closed datatype, it has all the advantages too! A type class can be closed too, thanks to functional dependencies. You were right: the code below is just the mechanical reshuffling of the original code, using the translation I come across a year ago. `empty' and `division' are now grouped by the parser type, which I personally like. It is hard to see any loss of expressiveness by getting rid of GADTs in this case. More explicit type is an advantage. Also an advantage is the fact that `pattern match' is guaranteed exhaustive. If I forgot to implement some particular parser, that would be a type error (rather than a pattern-match run-time error, as is the case with ADT/GADT). > Moreover, once you have to start messing about with equality > constraints, the coding to eliminate GADTs becomes a lot hairier. I > implemented it back in '95 (when all these things had different names) > and I was very glad not to have to do it by hand any more. The discovery of the presence of type equality in Haskell changes things a great deal. It would be interesting to see how your code of '95 will look now. Writing an equality and even _dis-equality_ constraint is just as simple as |TypeEq a b HTrue| or |TypeEq a b HFalse|. {-# OPTIONS -fglasgow-exts #-} module RX where import Control.Monad class RegExp r tok a | r tok -> a where empty :: r -> [tok] -> Maybe a divide :: tok -> r -> Division tok a data Zero = Zero data One = One newtype Check tok = Check (tok -> Bool) data Plus r1 r2 = Plus r1 r2 data Mult r1 r2 = Mult r1 r2 newtype Star r = Star r data Empty = Empty data Division tok x = forall r y. RegExp r tok y => Div r (y -> x) parse :: (RegExp r tok x) => r -> [tok] -> Maybe x parse r [EMAIL PROTECTED] = empty r t parse r (t : ts) = case divide t r of Div q f -> liftM f $ parse q ts nullt :: tok -> [tok] nullt _ = [] instance RegExp Zero tok Empty where empty _ _ = mzero divide _ _ = Div Zero naughtE instance RegExp One tok () where empty _ _ = return () divide _ _ = Div Zero naughtE instance RegExp (Check tok) tok tok where empty _ _ = mzero divide t (Check p) | p t = Div One (const t) divide _ _ = Div Zero naughtE instance (RegExp r1 tok a, RegExp r2 tok b) => RegExp (Plus r1 r2) tok (Either a b) where empty (Plus r1 r2) t = (liftM Left $ empty r1 t) `mplus` (liftM Right $ empty r2 t) divide t (Plus r1 r2) = case (divide t r1, divide t r2) of (Div q1 f1, Div q2 f2) -> Div (Plus q1 q2) (f1 +++ f2) instance (RegExp r1 tok a, RegExp r2 tok b) => RegExp (Mult r1 r2) tok (a,b) where empty (Mult r1 r2) t = liftM2 (,) (empty r1 t) (empty r2 t) divide t (Mult r1 r2) = case (empty r1 (nullt t), divide t r1, divide t r2) of (Nothing, Div q1 f1, _) -> Div (Mult q1 r2) (f1 *** id) (Just x1, Div q1 f1, Div q2 f2) -> Div (Plus (Mult q1 r2) q2) (either (f1 *** id) (((,) x1) . f2)) instance RegExp r tok a => RegExp (Star r) tok [a] where empty _ _ = return [] divide t (Star r) = case (divide t r) of Div q f -> Div (Mult q (Star r)) (\ (y, xs) -> (f y : xs)) (***) :: (a -> b) -> (c -> d) -> (a, c) -> (b, d) (f *** g) (a, c) = (f a, g c) (+++) :: (a -> b) -> (c -> d) -> Either a c -> Either b d (f +++ g) (Left a) = Left (f a) (f +++ g) (Right c) = Right (g c) naughtE :: Empty -> x naughtE = undefined p = (Star (Mult (Star (Check (== 'a'))) (Star (Check (== 'b'))))) testp = parse p "abaabaaabbbb" _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe