Re: LR(1) grammar/parser and lookahead-restrictions

Dmitry Soshnikov Tue, 14 Feb 2017 21:31:59 -0800

On Mon, Jan 23, 2017 at 10:46 PM, Dmitry Soshnikov <
[email protected]> wrote:


>
> On Mon, Jan 23, 2017 at 5:24 PM, Waldemar Horwat <[email protected]>
> wrote:
>
>> On 01/11/2017 10:28, Michael Dyck wrote:
>>
>>> In the past, it has been said (usually by Brendan Eich) that TC39
>>> intends that the ECMAScript grammar be LR(1). Is that still the case? (I'm
>>> not so much asking about the "1", but more about the "LR".)
>>>
>>> If so, I'm wondering how lookahead-restrictions (e.g., [lookahead <!
>>> terminals]) fit into the LR approach. It seems like there are three
>>> possibilities:
>>>   - modify the grammar (before the parser is constructed),
>>>   - modify the construction of the parser, and/or
>>>   - modify the operation of the parser.
>>> It's not clear to me that any of these (or any combination of these)
>>> will work.
>>>
>>> Has anyone algorithmically generated an LR parser from the spec grammar?
>>> If so, how did you you handle lookahead-restrictions?
>>>
>>
>> I'm the source of the LR(1) condition.  I've been doing that ever since
>> ECMAScript Edition 3, and in fact am using the LR parser to help design and
>> debug the spec.  I have an implementation of the parser with a few
>> extensions to the LR grammar, including support for parametrized
>> productions, lookahead restrictions, and lexer-parser feedback used to
>> disambiguate things such as what token / will start.  I validate various
>> things such as making sure that there is no place where both an identifier
>> and a regular expression can occur.
>>
>>
> Thanks for the confirmation.
>
> > lookahead restrictions
>
> Yeah, all the lookahead restrictions, such as `lookahead ∉ {{, function}`
> add a lot of extra "No<Lookahead>" productions, which makes the grammar
> (and hence the parsing table) bigger unfortunately. Example is e.g.
> `ExpressionNoIn`, `VariableDeclarationNoIn` from the spec, and similar for
> `...NoLeftCurrly`, and `...NoFunction` to support those lookahead
> restrictions, and allow `ExpressionStatement` to be parsed without
> "reduce/reduce" conflicts.
>
> That's why in the mentioned "sub-set" of the ECMAScript example language
> [1] I chose not to use the same syntax for function expression and function
> declaration: `fn foo() {}` for declaration, and just an arrow function for
> function-expressions :) -- no ambiguity, no lookahead restrictions, hence
> no larger grammar, hence a faster parser.
>
>  > lexer-parser feedback
>
> Yes, having a lexer/parser state solves it more elegantly.
>


FYI: I just built this as a proof of concept for ECMAScript's `{` and `}`,
and have parsed normally `{ }` in statement position as a `BlockStatement`,
and in the expression position as an `ObjectLiteral`. With special
"activation productions" one can change the lexer state, and yield
different token types for the same characters.

See the example here:
https://github.com/DmitrySoshnikov/syntax/blob/master/examples/parser-lexer-communication.g

And some details in the docs:
https://github.com/DmitrySoshnikov/syntax#access-tokenizer-from-parser-semantic-actions

Dmitry

_______________________________________________
es-discuss mailing list
[email protected]
https://mail.mozilla.org/listinfo/es-discuss

Re: LR(1) grammar/parser and lookahead-restrictions

Reply via email to