Re: [Haskell-cafe] Regular Expressions without GADTs

Conor McBride Wed, 19 Oct 2005 07:20:25 -0700

[EMAIL PROTECTED] wrote:

I suspect that once you start producing values with the
relevant properties (as I do) rather than just consuming them (as Oleg
and Bruno do), the GADT method might work out a little neater.


Actually, the code is pretty much your original code, with downcased
identifiers. It faithfully implements that parser division
approach. Still, there are no existentials.

That's because this isn't quite my program either, although I'll grantyou it's a lot closer. Give me back the uppercase identifiers and you'vepretty much done it. That's to say, I want a datatype of regularexpressions which happens (at least) to support parsing, rather than aparser library which supports (at least) regular languages.


We can leave it to others to decide which is neater, but I'd remark that...

I wouldn't say that GADT
code is so much different. Perhaps the code below is a bit neater due
to the absence of existentials, `case' statements, and local type
annotations.

...I'm also irritated by the case statements, the local typeannotations, and so on. I was rather surprised to find that I neededthem. I bet it's not an essential or permanent problem, and it'scertainly a problem we have planned not to have in Epigram.

The existential also bothers me, but for different reasons. In thedependently typed version, RegExp isn't a GADT: it's just an ordinarydatatype, and there's a perfectly ordinary function, Sem :: RegExp -> *,which computes the type corresponding to a regular expression. Thesyntactic part of division is thus completely separated from thesemantic part (glueing the head back on). The GADT version here replacesSem by an index in the datatype, so sometimes I'm forced to write anexistential in order to collect what would have been the result ofapplying Sem. You've avoided the existential by throwing away the syntaxand keeping only the specific empty-and-divide semantics required forthis particular example. In effect

* I define regular expressions and show they possess at least theempty-and-divide semantics;* You define the empty-and-divide semantics and show that it's closedunder at least the regular expression formers (nicely expressed viaBruno's class)


Let's see...

Here's the class of regular expression algebras:

-- Bruno.Oliveira's type class
class RegExp g where
   zero   :: g tok Zero
   one    :: g tok ()
   check  :: (tok -> Bool) -> g tok tok
   plus   :: g tok a -> g tok b -> g tok (Either a b)
   mult   :: g tok a -> g tok b -> g tok (a,b)
   star   :: g tok a -> g tok [a]


Here's the specification of the intended semantics:

data Parse tok t = Parse { empty :: Maybe t
                         , divide :: tok -> Parse tok t}

Here's where you explain how stuff other than regular expressions alsohas the semantics. This is how the existential is made to disappear: youuse liftP to fuse the head-glueing function into the parsers themselves.It's something like adding


 Map :: (s -> t) -> RegExp tok s -> RegExp tok t

to the syntax of regular expressions, isn't it?

liftP f p = Parse{empty  = liftM f (empty p),
                  divide = \tok -> liftP f (divide p tok)}

And then you show that indeed the desired semantics is closed underRegExp formation.

instance RegExp Parse  where

It's a classic instance of the Expression Problem. If you fix theelimination behaviour, it's easy to add new introduction behaviour; ifyou fix the introduction behaviour, it's easy to add new eliminationbehaviour. I think it's important to support both modes of working: typeclasses are good at the former, datatypes the latter, so let's be opento both. There's always a danger with small examples that there's not somuch to choose between the two approaches---except that they scale up incompletely different ways. It's important not to get into a sterileargument 'Oh, but I can extend the grammar', 'Oh, but I can extend thefunctionality', as if only one of these is ever important. I willhappily cheer anyone who can help us to be lighter on our feet injumping between the two.

I'm not saying there's no value to 'poor man's' alternatives to GADTs. Icertainly agree with Bruno(a) that it's important to know how to work within the standard; myexperiences implementing the first version of Epigram have convinced meto be much more cautious about which Haskell extensions I'm willing touse in the second(b) that the pattern 'show type constructor f has such-an-such analgebra' is useful and worth abstracting.

I was just about to send, when up popped Ross's program, which does giveme a datatype, including just enough extra stuff to support functoriality.

data RegExp tok a
        = Zero
        | One a
        | Check (tok -> a) (tok -> Bool)
        | Plus (RegExp tok a) (RegExp tok a)
        | forall b. Mult (RegExp tok (b -> a)) (RegExp tok b)
        | forall e. Star ([e] -> a) (RegExp tok e)

This combines several dodges rather neatly. You can see the traces ofthe translation from GADT to nested-datatype-with-equations, but insteadof saying forall e. Star (Eq [e] a) (RegExp tok e), Ross has kept justthe half of the iso he needs in order to extract the parsed value.Throwing away the other half more-or-less allows him to hide thehead-glueing functions inside the grammar of the regular expressions. Ineffect, Map has been distributed through the syntax. It's nice that thesyntax of regular expressions survives, but it has been somewhat spun infavour of the functionality required in the example. Of the fakes sofar, I like this the best, because it is *data*.

I would say that the availability of 'this workaround for this example,that workaround for that example', is evidence in *favour* of adoptingthe general tools which are designed to support many examples. If ajob's worth doing, it's worth doing well. Is there a *uniform*workaround which delivers all the GADT functionality cheaply andcleanly? Kind of subjective, I suppose, but I haven't seen anythingcheap enough for me.


All the best

Conor
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Regular Expressions without GADTs

Reply via email to