On Mon, Jan 23, 2017 at 10:46 PM, Dmitry Soshnikov < [email protected]> wrote:
> > On Mon, Jan 23, 2017 at 5:24 PM, Waldemar Horwat <[email protected]> > wrote: > >> On 01/11/2017 10:28, Michael Dyck wrote: >> >>> In the past, it has been said (usually by Brendan Eich) that TC39 >>> intends that the ECMAScript grammar be LR(1). Is that still the case? (I'm >>> not so much asking about the "1", but more about the "LR".) >>> >>> If so, I'm wondering how lookahead-restrictions (e.g., [lookahead <! >>> terminals]) fit into the LR approach. It seems like there are three >>> possibilities: >>> - modify the grammar (before the parser is constructed), >>> - modify the construction of the parser, and/or >>> - modify the operation of the parser. >>> It's not clear to me that any of these (or any combination of these) >>> will work. >>> >>> Has anyone algorithmically generated an LR parser from the spec grammar? >>> If so, how did you you handle lookahead-restrictions? >>> >> >> I'm the source of the LR(1) condition. I've been doing that ever since >> ECMAScript Edition 3, and in fact am using the LR parser to help design and >> debug the spec. I have an implementation of the parser with a few >> extensions to the LR grammar, including support for parametrized >> productions, lookahead restrictions, and lexer-parser feedback used to >> disambiguate things such as what token / will start. I validate various >> things such as making sure that there is no place where both an identifier >> and a regular expression can occur. >> >> > Thanks for the confirmation. > > > lookahead restrictions > > Yeah, all the lookahead restrictions, such as `lookahead ∉ {{, function}` > add a lot of extra "No<Lookahead>" productions, which makes the grammar > (and hence the parsing table) bigger unfortunately. Example is e.g. > `ExpressionNoIn`, `VariableDeclarationNoIn` from the spec, and similar for > `...NoLeftCurrly`, and `...NoFunction` to support those lookahead > restrictions, and allow `ExpressionStatement` to be parsed without > "reduce/reduce" conflicts. > > That's why in the mentioned "sub-set" of the ECMAScript example language > [1] I chose not to use the same syntax for function expression and function > declaration: `fn foo() {}` for declaration, and just an arrow function for > function-expressions :) -- no ambiguity, no lookahead restrictions, hence > no larger grammar, hence a faster parser. > > > lexer-parser feedback > > Yes, having a lexer/parser state solves it more elegantly. > FYI: I just built this as a proof of concept for ECMAScript's `{` and `}`, and have parsed normally `{ }` in statement position as a `BlockStatement`, and in the expression position as an `ObjectLiteral`. With special "activation productions" one can change the lexer state, and yield different token types for the same characters. See the example here: https://github.com/DmitrySoshnikov/syntax/blob/master/examples/parser-lexer-communication.g And some details in the docs: https://github.com/DmitrySoshnikov/syntax#access-tokenizer-from-parser-semantic-actions Dmitry
_______________________________________________ es-discuss mailing list [email protected] https://mail.mozilla.org/listinfo/es-discuss

