This is a rather belated summary of the replies to my earlier query
about the library of parsing combinators which comes with the hbc
compiler. It is based largely on advice from Alastair Reid, Stephen J
Bevan and Ken Sailor; thanks to all of them for helping me out.
(Incidentally, if anyone who wants to use the library does not
understand Parsing combinators I recommend Graham Hutton's article,
"Parsing using combinators", which appears in the proceedings of the
1989 Glasgow functional programming workshop. This has been published
by Springer, in their Workshops in Computing Science series; the
editors are Kei Davis and John Hughes.)
My question was about the class $Token$, whose signature is:
>class (Text a) => Token a where {
> compareT :: a -> a -> OrderedT;
> stringT :: a -> String;
> positionT :: a -> String;
> eqT :: a -> a -> Bool
> }
I wasn't sure of the intended meaning of this class or of how it is
used in the definitions of the library functions. Now that I've seen
a few replies to my query and done a small experiment I've come to the
conclusion that the library uses $eqT$ to test for equality but
doesn't use any of the other methods. This means that it is possible
to to define tokens which store any amount of extra information (such
as, for example, their position --- this would be of use in testing
for offside-ness) because we can control the way in which the parsing
combinators look at two tokens to decide whether or not they are
lexically equivalent.
Here's a short(ish) example of an instantiation of the $Token$ class. It
defines a type $LexToken$; it is intended that a string of LexTokens
will be produced by a lexer and then fed into a parser. Of course, in
the parser we are only interested in the class of a token, rather than
its actual value --- therefore, the equality test "hides" the extara
information.
>module LexToken(LexToken(..), lexer, ParseLib..) where
>import ParseLib
>data LexToken = Vname String
> | Integer String
> | Nop0 String
> | Nop1 String
> | Lop1 String
> | Lop2 String
> | Rop1 String
> | Rop2 String
> | LPar String
> | RPar String
> deriving (Eq, Text)
>instance Token LexToken where
>
> compareT x y = UnT
>
> eqT (Vname _) (Vname _) = True
> eqT (Integer _) (Integer _) = True
> eqT (Nop0 _) (Nop0 _) = True
> eqT (Nop1 _) (Nop1 _) = True
> eqT (Lop1 _) (Lop2 _) = True
> eqT (Rop1 _) (Rop1 _) = True
> eqT (Rop2 _) (Rop2 _) = True
> eqT (LPar _) (LPar _) = True
> eqT (RPar _) (RPar _) = True
> eqT _ _ = False -- watch out for this
>
> stringT (Vname x) = x
> stringT (Integer x) = x
> stringT (Nop0 x) = x
> stringT (Nop1 x) = x
> stringT (Lop1 x) = x
> stringT (Lop2 x) = x
> stringT (Rop1 x) = x
> stringT (Rop2 x) = x
> stringT (LPar x) = x
> stringT (RPar x) = x
>
> positionT x = ""
$positionT$ may be used to return the "extra information". The general
opinion amongst my advisors is that it returns a string because that
is the most flexible return type (remember that since we can derive a
Text instance for any algebraic type it is easy to convert values to
and from strings). In the declaration above, $positionT$ just returns
the empty string; the selector $stringT$ seems more appropriate for
conveying the only extra information which we have.
That's it . . . sorry for the delay, and thanks again to all the
people who helped me out.
balan