On 02/04/2017 07:20, Michael Dyck wrote:
On 17-02-03 05:32 PM, Waldemar Horwat wrote:
On 02/03/2017 08:17, Michael Dyck wrote:
On 17-02-02 06:23 PM, Waldemar Horwat wrote:

Lookahead restrictions fit very well into an LR(1) engine

Again: Great, but how? E.g., do you pre-process the grammar, modify the
construction of the automaton, and/or modify the operation of the parser?

For each state × token combination, the automaton describes what happens
when you're in state S and see a token T.  The lookahead restrictions remove
possible transitions; without them there would be ambiguities where a given
state × token combination would want to do two incompatible things.

Okay, so do you generate the automaton (ignoring lookahead restrictions) and 
then remove transitions (using lookahead restrictions)? Or do you integrate the 
lookahead-restrictions into the generation of the automaton?

It's integrated.  You can't generate a valid automaton without the lookahead 
restrictions.

That's different from parametrized rules, which simply macro-expand into
lots of individual rules.

Yup.


But the context-dependentness of lexing is a parse-time problem, not a
validation-time problem, right?

No.

The syntactic level can just assume a stream of (correctly lexed) input
elements.

(I should have said "*Validation* of the syntactic level [etc]")

No!  It's impossible to create a stream of correctly lexed input elements
without doing syntactic level parsing.

I quite agree. I didn't mean to suggest otherwise. What I mean is that, once 
you've generated the automaton for the syntactic grammar, you can just look at 
each state's set of expected terminals, and from that deduce the goal symbol 
that the lexer will need to use to get the next token when the parser is in 
that state. The point being that you can do that *after* generating the 
syntactic automaton. So the context-dependentness of lexing doesn't have to 
affect the process of generating the syntactic automaton.

That's correct.

(That's assuming an LR(1)-ish parser, and an approach where you don't try to 
combine the syntactic and lexical grammars to generate a single [scannerless] 
automaton. Which may not be your approach.)

The parser and lexer stay separate, other than the lexer providing tokens to 
the parser and the parser selecting one of several top-level lexer goal symbols 
for lexing the next token.  I do not use any kind of unified parser-lexer 
grammar; that could run into issues such as the syntactic grammar making lexing 
non-greedy: a+++++b lexing as a ++ + ++ b instead of the correct a ++ ++ + b 
(which would then generate a syntax error at the syntactic level).

The validator looks for problems such as the syntactic grammar giving the
lexer contradictory instructions. An example would be any syntactic
grammar automaton state where one outgoing syntactic transition would
swallow a regexp token and another outgoing syntactic transition from the
same state would swallow a / token. If any such state exists, the grammar
is broken.

Yup, the spec asserts the non-existence of such states ("syntactic grammar 
contexts").

-Michael

    Waldemar


_______________________________________________
es-discuss mailing list
[email protected]
https://mail.mozilla.org/listinfo/es-discuss

Reply via email to