Re: LR(1) grammar/parser and lookahead-restrictions

Waldemar Horwat Fri, 03 Feb 2017 14:33:12 -0800

On 02/03/2017 08:17, Michael Dyck wrote:

On 17-02-02 06:23 PM, Waldemar Horwat wrote:


Lookahead restrictions fit very well into an LR(1) engine


Again: Great, but how? E.g., do you pre-process the grammar, modify the 
construction of the automaton, and/or modify the operation of the parser?


For each state × token combination, the automaton describes what happens when 
you're in state S and see a token T.  The lookahead restrictions remove 
possible transitions; without them there would be ambiguities where a given 
state × token combination would want to do two incompatible things.

That's different from parametrized rules, which simply macro-expand into lots 
of individual rules.

as long as they're limited to only one token, and that's what I've
implemented in the validator.


So when the grammar has a prohibited lookahead-sequence with more than one 
token (in ExpressionStatement, IterationStatement, and ExportDeclaration), does 
the validator just use the first token?


My LR(1) validator can't actually handle the case of multiple tokens of 
lookahead, and I didn't feel like doing an LR(2) grammar.  I had to check these 
by hand.

You need to be very careful with them if looking more than one token
ahead because lexing of the tokens can vary based on context. For
example, if the next few characters in front of the cursor are )/abc/i+,
then what is the second token? What is the third token? It's actually
context-dependent.


But the context-dependentness of lexing is a parse-time problem, not a 
validation-time problem, right?

No.

The syntactic level can just assume a stream of (correctly lexed) input 
elements.


No!  It's impossible to create a stream of correctly lexed input elements 
without doing syntactic level parsing.  See the example I gave above.

Or do you validate deeper than the syntactic grammar?


Yes.  The syntactic grammar controls the lexer.  The validator looks for 
problems such as the syntactic grammar giving the lexer contradictory 
instructions.  An example would be any syntactic grammar automaton state where 
one outgoing syntactic transition would swallow a regexp token and another 
outgoing syntactic transition from the same state would swallow a / token.  If 
any such state exists, the grammar is broken.

(And it seems to me that multi-token lookahead-restrictions would be hard for a 
validator to handle even if lexing *wasn't* context-dependent, but maybe that 
depends on how you handle them.)

-Michael


_______________________________________________
es-discuss mailing list
[email protected]
https://mail.mozilla.org/listinfo/es-discuss

Re: LR(1) grammar/parser and lookahead-restrictions

Reply via email to