> I want to build the layout expansion ('{', '}' and ';')
> into the scanner.
In general you could only do this by building some parsing capability
into the scanner (though in musing about it I haven't found any cases
which couldn't be solved by adding simple "bracket-counting" for
interesting constructs). It is quite straightforward to integrate the
scanner with the parser so that the parser gives enough information for
the scanner to work correctly. Unless you *need* a standalone scanner
this is probably the easiest thing to do...
> Since I try to include this expander in the scanner, I tried to figure
> out which the cases might be, in which such "syntactical categories" end,
> independently from the layout.
>
> The only constructs I found are "(...)", "[...]", "{...}" and "let ... in".
You should think in terms of symbols (and then relate these
back to constructs!). The following are some that you've missed:
",", "then", "else", "..", "|", ";", "->"
Semicolons and close-braces could be ones which were inserted
automatically (perhaps by "error" processing), of course, so you need
to allow for this possibility!
> The question I'm raising is, if anybody knows a general rule for finding
> out this categories (I did it intuitively),
Computing the follow-set of the "}" token should give the symbols which
could cause problems (any standard parsing text should have an alg., but
it's usually simple to do intuitively). Subtract infix operators and
"::" since they will be handled by the "longest parse" rule.
The main problems arise with case expressions, such as:
[case of x -> case f of p | g -> e1 + e2, 2]
Two close braces must be inserted before the ",". Since the "|" is
a guard rather than a list comprehension, don't insert braces here!
On the other hand, you should insert the close braces before the
"]" in the following example,
[case of x -> case f of p | g -> (e1 + e2, 2)]
Regards,
Kevin