> I want to build the layout expansion ('{', '}' and ';')
> into the scanner.

In general you could only do this by building some parsing capability
into the scanner (though in musing about it I haven't found any cases
which couldn't be solved by adding simple "bracket-counting" for
interesting constructs).  It is quite straightforward to integrate the
scanner with the parser so that the parser gives enough information for
the scanner to work correctly.  Unless you *need* a standalone scanner
this is probably the easiest thing to do...

> Since I try to include this expander in the scanner, I tried to figure
> out which the cases might be, in which such "syntactical categories" end,
> independently from the layout.
> 
> The only constructs I found are "(...)", "[...]", "{...}" and "let ... in".

You should think in terms of symbols (and then relate these
back to constructs!).  The following are some that you've missed:

        ",", "then", "else", "..", "|", ";", "->"

Semicolons and close-braces could be ones which were inserted
automatically (perhaps by "error" processing), of course, so you need
to allow for this possibility!

> The question I'm raising is, if anybody knows a general rule for finding
> out this categories (I did it intuitively),

Computing the follow-set of the "}" token should give the symbols which
could cause problems (any standard parsing text should have an alg., but
it's usually simple to do intuitively).  Subtract infix operators and
"::" since they will be handled by the "longest parse" rule.

The main problems arise with case expressions, such as:

        [case of x -> case f of p | g -> e1 + e2, 2]

Two close braces must be inserted before the ",".  Since the "|" is
a guard rather than a list comprehension, don't insert braces here!
On the other hand, you should insert the close braces before the
"]" in the following example,

        [case of x -> case f of p | g -> (e1 + e2, 2)]

Regards,
Kevin

Reply via email to