Re: [Haskell-cafe] Regular Expressions without GADTs

2005-10-19 Thread Bruno Oliveira
Hello Conor,

I personally think GADTs are a great idea, don't get me wrong. I am not arguing
that we should all start programming using on a Church encoding. Neither I
am arguing that this encoding can replace GADTs in all cases.

Nevertheless, I believe that knowing about this encoding, in the first
place, can be extremely valuable for programmers. Let's not forget that neither
GADTs or multiple parameter type classes are Haskell 98! In fact, as far as
I understood from the Haskell Workshop, we are a long way from those extensions
becoming standard. My point is that if you are developing software,
then you want to use something standard (this is, running on Hugs, Nhc,
Ghc, Ehc, ...).
This encoding tries to avoid extensions (not saying that it does for all
cases).

In the end, that is what most design patterns are: workarounds on missing
language features. In this case missing features of Haskell 98 (but not
GHC Haskell).

Best Regards,

Bruno Oliveira
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Regular Expressions without GADTs

2005-10-19 Thread Ross Paterson
On Tue, Oct 18, 2005 at 05:51:13PM +0100, Conor McBride wrote:
| Neither Oleg nor Bruno translated my code; they threw away my 
| structurally recursive on-the-fly automaton and wrote combinator parsers 
| instead. That's why there's no existential, etc. The suggestion that 
| removing the GADT simplifies the code would be better substantiated if 
| like was compared with like.

Here's a version in H98 + existentials.  I'm not claiming it proves
anything.

 module RegExp where

 import Control.Monad

 data RegExp tok a
   = Zero
   | One a
   | Check (tok - a) (tok - Bool)
   | Plus (RegExp tok a) (RegExp tok a)
   | forall b. Mult (RegExp tok (b - a)) (RegExp tok b)
   | forall e. Star ([e] - a) (RegExp tok e)

I could just use Prod (One f) instead of fmap f, but the functor
instance is mechanical:

 instance Functor (RegExp tok) where
   fmap f Zero = Zero
   fmap f (One v) = One (f v)
   fmap f (Check k p) = Check (f.k) p
   fmap f (Plus r1 r2) = Plus (fmap f r1) (fmap f r2)
   fmap f (Mult r1 r2) = Mult (fmap (f.) r1) r2
   fmap f (Star k r) = Star (f.k) r

Constructors

 one :: RegExp tok ()
 one = One ()

 check :: (tok - Bool) - RegExp tok tok
 check = Check id

 plus :: RegExp tok a - RegExp tok b - RegExp tok (Either a b)
 plus r1 r2 = Plus (fmap Left r1) (fmap Right r2)

 mult :: RegExp tok a - RegExp tok b - RegExp tok (a,b)
 mult r1 r2 = Mult (fmap (,) r1) r2

 star :: RegExp tok a - RegExp tok [a]
 star = Star id

Parsing

 parse :: RegExp tok a - [tok] - Maybe a
 parse r []   = empty r
 parse r (t : ts) = parse (divide t r) ts

 empty :: RegExp tok a - Maybe a
 empty Zero= mzero
 empty (One v) = return v
 empty (Check _ _) = mzero
 empty (Plus r1 r2)= empty r1 `mplus` empty r2
 empty (Mult r1 r2)= empty r1 `ap` empty r2
 empty (Star k _)  = return (k [])

 divide :: tok - RegExp tok a - RegExp tok a
 divide t Zero = Zero
 divide t (One v)  = Zero
 divide t (Check k p)
   | p t   = One (k t)
   | otherwise = Zero
 divide t (Plus r1 r2) = Plus (divide t r1) (divide t r2)
 divide t (Mult r1 r2) = case empty r1 of
   Nothing - Mult (divide t r1) r2
   Just f  - Plus (Mult (divide t r1) r2) (fmap f (divide t r2))
 divide t (Star k r)   = Mult (fmap k_cons (divide t r)) (Star id r)
   where k_cons x xs = k (x:xs)

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Regular Expressions without GADTs

2005-10-19 Thread Conor McBride

[EMAIL PROTECTED] wrote:


I suspect that once you start producing values with the
relevant properties (as I do) rather than just consuming them (as Oleg
and Bruno do), the GADT method might work out a little neater.
   



Actually, the code is pretty much your original code, with downcased
identifiers. It faithfully implements that parser division
approach. Still, there are no existentials.



That's because this isn't quite my program either, although I'll grant 
you it's a lot closer. Give me back the uppercase identifiers and you've 
pretty much done it. That's to say, I want a datatype of regular 
expressions which happens (at least) to support parsing, rather than a 
parser library which supports (at least) regular languages.


We can leave it to others to decide which is neater, but I'd remark that...


I wouldn't say that GADT
code is so much different. Perhaps the code below is a bit neater due
to the absence of existentials, `case' statements, and local type
annotations.
 



...I'm also irritated by the case statements, the local type 
annotations, and so on. I was rather surprised to find that I needed 
them. I bet it's not an essential or permanent problem, and it's 
certainly a problem we have planned not to have in Epigram.


The existential also bothers me, but for different reasons. In the 
dependently typed version, RegExp isn't a GADT: it's just an ordinary 
datatype, and there's a perfectly ordinary function, Sem :: RegExp - *, 
which computes the type corresponding to a regular expression. The 
syntactic part of division is thus completely separated from the 
semantic part (glueing the head back on). The GADT version here replaces 
Sem by an index in the datatype, so sometimes I'm forced to write an 
existential in order to collect what would have been the result of 
applying Sem. You've avoided the existential by throwing away the syntax 
and keeping only the specific empty-and-divide semantics required for 
this particular example. In effect


* I define regular expressions and show they possess at least the 
empty-and-divide semantics;
* You define the empty-and-divide semantics and show that it's closed 
under at least the regular expression formers (nicely expressed via 
Bruno's class)


Let's see...

Here's the class of regular expression algebras:


-- Bruno.Oliveira's type class
class RegExp g where
   zero   :: g tok Zero
   one:: g tok ()
   check  :: (tok - Bool) - g tok tok
   plus   :: g tok a - g tok b - g tok (Either a b)
   mult   :: g tok a - g tok b - g tok (a,b)
   star   :: g tok a - g tok [a]
 



Here's the specification of the intended semantics:


data Parse tok t = Parse { empty :: Maybe t
 , divide :: tok - Parse tok t}
 



Here's where you explain how stuff other than regular expressions also 
has the semantics. This is how the existential is made to disappear: you 
use liftP to fuse the head-glueing function into the parsers themselves. 
It's something like adding


 Map :: (s - t) - RegExp tok s - RegExp tok t

to the syntax of regular expressions, isn't it?


liftP f p = Parse{empty  = liftM f (empty p),
  divide = \tok - liftP f (divide p tok)}
 



And then you show that indeed the desired semantics is closed under 
RegExp formation.



instance RegExp Parse  where
 



It's a classic instance of the Expression Problem. If you fix the 
elimination behaviour, it's easy to add new introduction behaviour; if 
you fix the introduction behaviour, it's easy to add new elimination 
behaviour. I think it's important to support both modes of working: type 
classes are good at the former, datatypes the latter, so let's be open 
to both. There's always a danger with small examples that there's not so 
much to choose between the two approaches---except that they scale up in 
completely different ways. It's important not to get into a sterile 
argument 'Oh, but I can extend the grammar', 'Oh, but I can extend the 
functionality', as if only one of these is ever important. I will 
happily cheer anyone who can help us to be lighter on our feet in 
jumping between the two.


I'm not saying there's no value to 'poor man's' alternatives to GADTs. I 
certainly agree with Bruno
 (a) that it's important to know how to work within the standard; my 
experiences implementing the first version of Epigram have convinced me 
to be much more cautious about which Haskell extensions I'm willing to 
use in the second
 (b) that the pattern 'show type constructor f has such-an-such an 
algebra' is useful and worth abstracting.


I was just about to send, when up popped Ross's program, which does give 
me a datatype, including just enough extra stuff to support functoriality.



data RegExp tok a
= Zero
| One a
| Check (tok - a) (tok - Bool)
| Plus (RegExp tok a) (RegExp tok a)
| forall b. Mult (RegExp tok (b - a)) (RegExp tok b)
| forall e. Star ([e] - a) (RegExp tok e)
 



This combines 

Re: [Haskell-cafe] Regular Expressions without GADTs

2005-10-19 Thread Ross Paterson
On Wed, Oct 19, 2005 at 03:17:41PM +0100, Conor McBride wrote:
 I was just about to send, when up popped Ross's program, which does give 
 me a datatype, including just enough extra stuff to support functoriality.
 
 data RegExp tok a
 = Zero
 | One a
 | Check (tok - a) (tok - Bool)
 | Plus (RegExp tok a) (RegExp tok a)
 | forall b. Mult (RegExp tok (b - a)) (RegExp tok b)
 | forall e. Star ([e] - a) (RegExp tok e)
 
 This combines several dodges rather neatly. You can see the traces of 
 the translation from GADT to nested-datatype-with-equations, but instead 
 of saying forall e. Star (Eq [e] a) (RegExp tok e), Ross has kept just 
 the half of the iso he needs in order to extract the parsed value. 

The direction I've kept is the one that matters if everything is
covariant, because

forall a. F a - T (G a)
~=
forall a, b. (G a - b) - F a - T b

if F, G and T are functors.  So we can translate

data T :: * - * where
C :: F a - T (G a)
to
data T b
= forall a. C (G a - b) (F a)

and the mechanical translation of your GADT is

data RegExp tok a
= Zero (Empty - a) Empty
| One (() - a) ()
| Check (tok - a) (tok - Bool) (RegExp tok a)
| forall b c. Plus (Either b c - a) (RegExp tok b) (RegExp tok c)
| forall b c. Mult ((b, c) - a) (RegExp tok b) (RegExp tok c)
| forall e. Star ([e] - a) (RegExp tok e)

A little simplification (ignoring lifted types here and there) yields
the version I gave.

And of course we're no longer assuming that the function is half of an iso.

 Throwing away the other half more-or-less allows him to hide the 
 head-glueing functions inside the grammar of the regular expressions. In 
 effect, Map has been distributed through the syntax.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Regular Expressions without GADTs

2005-10-19 Thread Ross Paterson
On Wed, Oct 19, 2005 at 03:17:41PM +0100, Conor McBride wrote:
 It's nice that the syntax of regular expressions survives, but it
 has been somewhat spun in favour of the functionality required in
 the example.

I'm not sure about that either -- the existential coding supports
elimination too, if the result type is functorial (again specializing
the general scheme):

caseRegExp
:: Functor r
= r Empty
- r ()
- ((tok - Bool) - r tok)
- (forall a b. RegExp tok a - RegExp tok b - r (Either a b))
- (forall a b. RegExp tok a - RegExp tok b - r (a, b))
- (forall a. RegExp tok a - r [a])
- RegExp tok v - r v
caseRegExp fZero fOne fCheck fPlus fMult fStar x = case x of
Zero- fmap (const bot) fZero
One v   - fmap (const v)   fOne
Check k p   - fmap k   (fCheck p)
Plus r1 r2  - fmap (either id id)  (fPlus r1 r2)
Mult r1 r2  - fmap (uncurry id)(fMult r1 r2)
Star k r- fmap k   (fStar r)
  where bot = error can't happen

Not that this is comfortable to use, but it does show that existentials
are as expressive, assuming the functor: they can't do RegExp t a - a - a.

 I would say that the availability of 'this workaround for this example, 
 that workaround for that example', is evidence in *favour* of adopting 
 the general tools which are designed to support many examples. If a 
 job's worth doing, it's worth doing well. Is there a *uniform* 
 workaround which delivers all the GADT functionality cheaply and 
 cleanly? Kind of subjective, I suppose, but I haven't seen anything 
 cheap enough for me.

It seems the existential coding is uniform, if everything is covariant.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Regular Expressions without GADTs

2005-10-18 Thread Bruno Oliveira
Hello,

You can also write this code in Haskell 98. Jeremy Gibbons and I have
recently written a paper entitled TypeCase: A design pattern for
type-indexed functions where we explore the use of some techniques from
lightweight approaches to generic programming (Ralf Hinze and James
Cheney's work) for other purposes.
If you are interested you can take a look at:

http://web.comlab.ox.ac.uk/oucl/work/bruno.oliveira/typecase.pdf

Basically, we can use type classes and GADTs as two alternatives approaches
to implement type-based case analysis. A well-known example of this is
printf. With printf we want to construct a format string that determines the
type of printf. Something similar happens here, we want to build a parser based
on the type of the regular expression.

There is nothing ingenious with this solution --- I basically translated
Oleg's and Conor's code. In the paper we discuss in details different
trade-offs between different implementations (e.g. GADT vs type classes).
For instance, here is something you cannot do (easily) with the GADT
approach:

 myParser :: MonadPlus m = Parse m tok Int
 myParser = Parse (\_ - return (1::Int))

 test = asList (parse myParser ole)

Basically, with the code that follows you can define your own customized
Regular expressions that you can use with your parser. (Sorry for the
stupid example, but just to give you the idea).

Ultimately you can reason on your programs (of this kind!) using GADTs or
multiple-parameter
type classes with functional dependencies and them translate them into
Haskell 98.

Here is the code:

 module RegExps where

 import Monad

 newtype Zero = Zero Zero -- Zero type in Haskell 98

 class RegExp g where
 zero   :: g tok Zero
 one:: g tok ()
 check  :: (tok - Bool) - g tok tok
 plus   :: g tok a - g tok b - g tok (Either a b)
 mult   :: g tok a - g tok b - g tok (a,b)
 push   :: tok - g tok r - g tok r
 star   :: g tok a - g tok [a]

 newtype Parse m tok t = Parse {parse :: [tok] - m t}

 instance MonadPlus m = RegExp (Parse m) where
 zero= Parse (\_ - mzero)
 one = Parse (\l -
 case l of
 [] - return ()
 _  - mzero)
 check p = Parse (\l -
 case l of
 [t] - if (p t) then return t else mzero
 _   - mzero)
 plus r1 r2  = Parse (\l - (liftM Left $ parse r1 l) `mplus`
(liftM Right $ parse r2 l))
 push tok r  = Parse (\l - parse r (tok:l))
 mult r1 r2  = Parse (\l -
 case l of
 [] - liftM2 (,) (parse r1 l) (parse r2 l)
 (t:ts) - parse (mult (push t r1) r2) ts `mplus`
   liftM2 (,) (parse r1 ([] `asTypeOf` ts)) (parse r2
(t:ts)))
 star r  = Parse (\l -
 case l of
 [] - return []
 ts - parse (mult r (star r)) ts = (\(x,xs) - return
$ x : xs))

Problem with the monomorphism restriction

 p1 :: RegExp g = g Char ([Char], [Char])
 p1 = mult (star (check (== 'a'))) (star (check (== 'b')))

p1 = (Mult (Star (Check (== 'a'))) (Star (Check (== 'b'

 asMayBe :: Maybe a - Maybe a
 asMayBe = id

 asList :: [a] - [a]
 asList = id

 testp = asMayBe $
   parse (star (mult (star (check (== 'a'))) (star (check (==
'b')
   abaabaaa

*RX testp
Just [(a,b),(aa,b),(aaa,)]

-- see alternative parses

 testp1 = take 3 $ asList $
   parse (star (mult (star (check (== 'a'))) (star (check (==
'b')
   abaabaaa


-- Parsing the list of integers

 ieven = even :: Int-Bool
 iodd  = odd  :: Int-Bool

 p2 :: RegExp g = g Int (Either (Int, (Int, [Int])) (Int, [Int]))
 p2 = plus (mult (check iodd) (mult (check iodd) (star (check ieven
   (mult (check ieven) (star (check iodd)))

-- the parser is ambiguous. We can see the alternatives

 test2 = take 3 $ asList $ parse (star p2) [1::Int,1,2,3,3,4]

Connor's code for empty.

 {-
 newtype Empty tok t = Empty {empty :: Maybe t}

 instance RegExp Empty where
 zero= Empty mzero
 one = Empty (return ())
 check _ = Empty mzero
 plus r1 r2  = Empty ((return Left `ap` empty r1) `mplus`
  (return Right `ap` empty r2))
 mult r1 r2  = Empty (return (,) `ap` empty r1 `ap` empty r2)
 star _  = Empty (return [])
 -}

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Regular Expressions without GADTs

2005-10-18 Thread Conor McBride

Hi folks


There is nothing ingenious with this solution --- I basically translated
Oleg's and Conor's code.



Neither Oleg nor Bruno translated my code; they threw away my 
structurally recursive on-the-fly automaton and wrote combinator parsers 
instead. That's why there's no existential, etc. The suggestion that 
removing the GADT simplifies the code would be better substantiated if 
like was compared with like. By the way, the combinator parsers, as 
presented, don't quite work: try


 asMayBe $ parse (Star One) I really must get to the bottom of this...

with Oleg's code, or the analogous (star one) with Bruno's.

But it's almost as easy a mistake to fix (I hope) as it is to make, so 
the point that the GADT isn't strictly necessary does stand. I wouldn't 
swear my code's correct either, but I don't think it loops on finite input.


Oleg simulates GADTs with Ye Olde Type Class Trick, getting around the 
fact that type classes aren't closed by adding each necessary piece of 
functionality to the methods. By contrast, Bruno goes straight for the 
Church encoding, as conveniently packaged by type classes: even if you 
can't write down the (morally) *initial* RegExp algebra, you can write 
down what RegExp algebras are in general. These two approaches aren't 
mutually exclusive: you can use Bruno's algebras to capture the 
canonical functionality which should be supported for all types in 
Oleg's class. Then you're pretty much writing down that a GADT is an 
inductive relation, which perhaps you had guessed already...


Where we came in, if you recall, was the use of a datatype to define 
regular expressions. I used a GADT to equip this type with some useful 
semantic information (namely a type of parsed objects), and I wrote a 
program (divide) which exploited this representation to compute the 
regexp the tail of the input must match from the regexp the whole input 
must match. This may be a bit of a weird way to write a parser, but it's 
a useful sort of thing to be able to do. There are plenty of other 
examples of 'auxiliary' types computed from types to which they relate 
in some interesting way, the Zipper being the a favourite. It's useful 
to have a syntax (Generic Haskell, GADTs etc) which makes that look as 
straightforward and concrete as possible.


I'm sure the program I actually wrote can be expressed with the type 
class trick, just by cutting up my functions and pasting the bits into 
individual instances; the use of the existential is still available. I 
don't immediately see how to code this up in Bruno's style, but that 
doesn't mean it can't be done. Still, it might be worth comparing like 
with like. I suspect that once you start producing values with the 
relevant properties (as I do) rather than just consuming them (as Oleg 
and Bruno do), the GADT method might work out a little neater.


And that's the point: not only does the GADT have all the disadvantages 
of a closed datatype, it has all the advantages too! It's easy to write 
new functions which both consume and produce data, without having to go 
back and change the definition. I didn't have to define Division or 
divide in order to say what a regular expression was. If I remember 
rightly, regular languages are closed under a whole bunch of useful 
stuff (intersection, complementation, ...) so someone who can remember 
better than me might implement the corresponding algorithms together 
with their related operations; they don't need to change the definition 
of RegExp to do that. Might we actually decide semantic subtyping 
(yielding a coercion) that way?


So what's the point? GADTs are a very convenient way to provide data 
which describe types. Their encoding via type classes, or
whatever, may be possible, but this encoding is not necessarily as 
convenient as the GADT. Whether or not the various encodings of GADTs 
deliver more convenience in this example is not clear, because the 
programs do not correspond. Moreover, once you have to start messing 
about with equality constraints, the coding to eliminate GADTs becomes a 
lot hairier. I implemented it back in '95 (when all these things had 
different names) and I was very glad not to have to do it by hand any more.


But your mileage may vary

Conor
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Regular Expressions without GADTs

2005-10-18 Thread oleg

Conor McBride wrote:
 Neither Oleg nor Bruno translated my code; they threw away my
 structurally recursive on-the-fly automaton and wrote combinator parsers
 instead. That's why there's no existential, etc. The suggestion that
 removing the GADT simplifies the code would be better substantiated if
 like was compared with like.

The following code specifically tries to be as close to the original
Conor McBride's code as possible. I only removed local type
annotations (turn out unnecessary) and replaced `ap' with `liftM' as a
matter of personal preference.

 Where we came in, if you recall, was the use of a datatype to define
 regular expressions. I used a GADT to equip this type with some useful
 semantic information (namely a type of parsed objects), and I wrote a
 program (divide) which exploited this representation to compute the
 regexp the tail of the input must match from the regexp the whole input
 must match. This may be a bit of a weird way to write a parser, but it's
 a useful sort of thing to be able to do...

That particular feature is fully preserved. Data types are used to
define regular expression:

 p = (Star (Mult (Star (Check (== 'a'))) (Star (Check (== 'b')

 testp = parse p abaabaaa

Moreover,
  *RX :t p
  p :: Star (Mult (Star (Check Char)) (Star (Check Char)))

the regular expression is apparent in the type of the `parser' (or its
representation, to be precise). One can easily define data types
Digit, Alphanumeric, WhiteSpace, etc. and so type will look more
informative. Here's one step to regular types... Instead of GADT, a
type class is used to associate semantic information with the data
type (with labels). Typing is more explicit now.

 And that's the point: not only does the GADT have all the disadvantages
 of a closed datatype, it has all the advantages too!

A type class can be closed too, thanks to functional dependencies.

You were right: the code below is just the mechanical reshuffling of
the original code, using the translation I come across a year
ago. `empty' and `division' are now grouped by the parser type, which
I personally like. It is hard to see any loss of expressiveness by
getting rid of GADTs in this case. More explicit type is an
advantage. Also an advantage is the fact that `pattern match' is
guaranteed exhaustive. If I forgot to implement some particular
parser, that would be a type error (rather than a pattern-match
run-time error, as is the case with ADT/GADT).


 Moreover, once you have to start messing about with equality
 constraints, the coding to eliminate GADTs becomes a lot hairier. I
 implemented it back in '95 (when all these things had different names)
 and I was very glad not to have to do it by hand any more.

The discovery of the presence of type equality in Haskell changes
things a great deal. It would be interesting to see how your code of
'95 will look now. Writing an equality and even _dis-equality_
constraint is just as simple as |TypeEq a b HTrue| or |TypeEq a b
HFalse|.

{-# OPTIONS -fglasgow-exts #-}

module RX where

import Control.Monad

class RegExp r tok a | r tok - a where
   empty :: r - [tok] - Maybe a
   divide :: tok - r - Division tok a

data Zero = Zero
data One = One
newtype Check tok = Check (tok - Bool)
data Plus r1 r2 = Plus r1 r2
data Mult r1 r2 = Mult r1 r2
newtype Star r = Star r

data Empty = Empty

data Division tok x = forall r y. RegExp r tok y = Div r (y - x)

parse :: (RegExp r tok x) = r - [tok] - Maybe x
parse r [EMAIL PROTECTED]   = empty r t
parse r (t : ts) = case divide t r of
  Div q f - liftM f $ parse q ts

nullt :: tok - [tok]
nullt _ = []

instance RegExp Zero tok Empty where
empty  _ _ = mzero
divide _ _  = Div Zero naughtE

instance RegExp One tok () where
empty  _ _ = return ()
divide _ _  = Div Zero naughtE

instance RegExp (Check tok) tok tok where
empty  _ _ = mzero
divide t (Check p) | p t= Div One (const t)
divide _ _  = Div Zero naughtE

instance (RegExp r1 tok a, RegExp r2 tok b) 
= RegExp (Plus r1 r2) tok (Either a b) where
empty (Plus r1 r2) t = (liftM Left  $ empty r1 t) `mplus`
   (liftM Right $ empty r2 t)
divide t (Plus r1 r2) =
  case (divide t r1, divide t r2) of
(Div q1 f1, Div q2  f2) -
  Div (Plus q1 q2) (f1 +++ f2)

instance (RegExp r1 tok a, RegExp r2 tok b) 
= RegExp (Mult r1 r2) tok (a,b) where
empty (Mult r1 r2) t = liftM2 (,) (empty r1 t) (empty r2 t)
divide t (Mult r1 r2) = 
case (empty r1 (nullt t), divide t r1, divide t r2) of
 (Nothing, Div q1 f1, _) - Div (Mult q1 r2) (f1 *** id)
 (Just x1, Div q1 f1, Div q2 f2) -
 Div (Plus (Mult q1 r2) q2) 
 (either (f1 *** id) (((,) x1) . f2))

instance RegExp r tok a = RegExp (Star r) tok [a] where
empty  _ _ = return []
divide t (Star r) = case (divide t r) of

Re: [Haskell-cafe] Regular Expressions without GADTs

2005-10-18 Thread oleg

Conor McBride wrote:
 Neither Oleg nor Bruno translated my code; they threw away my
 structurally recursive on-the-fly automaton and wrote combinator parsers
 instead. That's why there's no existential, etc. The suggestion that
 removing the GADT simplifies the code would be better substantiated if
 like was compared with like.
...

 I'm sure the program I actually wrote can be expressed with the type
 class trick, just by cutting up my functions and pasting the bits into
 individual instances; the use of the existential is still available. I
 don't immediately see how to code this up in Bruno's style, but that
 doesn't mean it can't be done. Still, it might be worth comparing like
 with like.

Please see the enclosed code. It is still in Haskell98 -- and works in
Hugs. 

 I suspect that once you start producing values with the
 relevant properties (as I do) rather than just consuming them (as Oleg
 and Bruno do), the GADT method might work out a little neater.

Actually, the code is pretty much your original code, with downcased
identifiers. It faithfully implements that parser division
approach. Still, there are no existentials. I wouldn't say that GADT
code is so much different. Perhaps the code below is a bit neater due
to the absence of existentials, `case' statements, and local type
annotations.

{-- Haskell98! --}

module RegExps where

import Monad

newtype Zero = Zero Zero -- Zero type in Haskell 98

-- Bruno.Oliveira's type class
class RegExp g where
zero   :: g tok Zero
one:: g tok ()
check  :: (tok - Bool) - g tok tok
plus   :: g tok a - g tok b - g tok (Either a b)
mult   :: g tok a - g tok b - g tok (a,b)
star   :: g tok a - g tok [a]

data Parse tok t = Parse { empty :: Maybe t
 , divide :: tok - Parse tok t}


parse :: Parse tok a - [tok] - Maybe a
parse p [] = empty p
parse p (t:ts) = parse (divide p t) ts

liftP f p = Parse{empty  = liftM f (empty p),
  divide = \tok - liftP f (divide p tok)}

liftP2 mf p1 p2 = Parse{empty = mf (empty p1) (empty p2),
divide = \tok - liftP2 mf (divide p1 tok) 
   (divide p2 tok)}

lsum x y  = (liftM Left x) `mplus` (liftM Right y)
lprod x y = liftM2 (,) x y

-- Conor McBride's parser division algorithm

instance RegExp Parse  where
zero= Parse mzero (\_ - zero)
one = Parse (return ()) (\_ - liftP (const ()) zero)
check p = Parse mzero (\t - if p t 
then liftP (const t) one
else liftP (const t) zero)

plus r1 r2  = Parse (lsum (empty r1) (empty r2))
(\t - plus (divide r1 t) (divide r2 t))


mult r1 r2  = Parse (lprod (empty r1) (empty r2))
(\t - let (q1,q2) = (divide r1 t, divide r2 t)
   lp x1 = liftP (either id ((,) x1))
   in maybe 
   (mult q1 r2)
   (\x1 - lp x1 (plus (mult q1 r2) q2))
   (empty r1))

star r  = Parse (return [])
(\t- liftP (uncurry (:)) (mult (divide r t) (star r)))

p1 :: RegExp g = g Char ([Char], [Char])
p1 = mult (star (check (== 'a'))) (star (check (== 'b')))

testp = parse (star (mult (star (check (== 'a'))) (star (check (== 'b')
abaabaaa
{-
*RX testp
Just [(a,b),(aa,b),(aaa,)]

-}

testc = parse (star one) abracadabra


-- Parsing the list of integers

ieven = even :: Int-Bool
iodd  = odd  :: Int-Bool

p2 :: RegExp g = g Int (Either (Int, (Int, [Int])) (Int, [Int]))
p2 = plus (mult (check iodd) (mult (check iodd) (star (check ieven
(mult (check ieven) (star (check iodd)))

test2 = parse (star p2) [1::Int,1,2,3,3,4]
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Regular Expressions without GADTs

2005-10-17 Thread Ralf Hinze
 The code seems a bit simpler, too.

Do you really think so? To me replacing a GADT by class and
instance declarations seems the wrong way round. We should
not forget that the DT in GADT stands for `data type'. One
could certainly argue that the gist of functional programming
is to define a collection of data types and functions that
operate on these types. The fact that with GADTs constructors
can have more general types just allows us to use the functional
design pattern more often (freeing us from the need or temptation
to resort to type class hackery with multiple parameter type
classes, undecidable instances, functional dependencies etc).

Cheers, Ralf
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe