On Mon, Jan 30, 2017 at 9:26 AM, Michael Dyck <[email protected]> wrote:
> On 17-01-30 12:15 AM, Dmitry Soshnikov wrote: > >> >> As mentioned above, there are two ways to handle lookahead restrictions in >> LR-parser: >> >> 1. Using "No<Lookahead>" productions. This unfortunately may double number >> of productions in some sub-grammar. E.g. to handle Block vs. ObjectLiteral >> in ECMAScript, one needs to double all the expression productions to >> correctly parse `ExpressionStatement`. As an example in the spec itself, >> take a look e.g. at `RelationalExpression` and `RelationalExpressionNoIn`, >> and how it repeats (doubles) all the productions but with "NoIn" suffix >> https://es5.github.io/#x11.8 >> > > So, e.g., one could take the production > IterationStatement : ... [lookahead != 'let'] LHSExpr ... > and transform it into > IterationStatement : ... LHSExpr-that-does-not-start-with-let ... > and then generate productions for the new nonterminal, e.g. > LHSExpr-that-does-not-start-with-let : > New-Expression-that-does-not-start-with-let > Call-Expression-that-does-not-start-with-let > etc, (eventually bottoming out when a RHS of the original grammar will > either always or never start with 'let'). > > Correct, and this is how ES spec itself handles all those "No<lookahead>" duplicated productions. > The problem with this approach is that, in general, a > lookahead-restriction is not a restriction on the nonterminal that > immediately follows it in the production, it's a restriction on the next > few input elements, which might be generated by more than just that > nonterminal. Well, in this case it's not a "lookahead" anymore. You could have k=2,3,...N for lookahead in parser generators, but it's not like parse the whole non-terminal, and then see a lookahead after it. This "parse whole non-terminal till the end" might be parse the whole program. In this case a cover grammar is simpler, and that's why it's used starting since ES6 spec. Take a look e.g. at `CoverParenthesizedExpressionAndArrowParameterList` from ES2015. A simplified example: https://github.com/DmitrySoshnikov/syntax/blob/master/examples/lang.bnf#L514-L524 By just looking at: ``` (x, y) ``` you can't tell whether it's a grouping operator (`SequenceExpression` node), or a beginning of a lambda function (i.e. its parameters node). So in the generic `ParenthisizedExpression` production, you reinterpret it either to lambda parameters, or keep just a grouping operator, if there is no `LambdaTail`. Here it's a function of course, though for this you have to parse `(x, y)` not as `LambdaParameters`, and not as `SequenceExpression`, but as a generic `ParenthisizedExpression`, and using Cover grammar: ``` (x, y) -> { return x + y; } ``` > I'm not saying this is insurmountable, but it does seem to get messy > fairly quickly. > > And it's unclear how this approach would handle a lookahead-restriction at > the end of a production. Yes, it adds a bunch of duplicated productions to the grammar, which on practice may increases number of parsing states simnifically. As a real practical example: ES5 spec in LALR(1) mode would increase from ~350 states to 470. > > > > 2. Another approach is to use Cover grammar. >> > > Is there a way to automatically (a) construct the cover grammar and (b) > reconstruct the parse tree as if parsed by the original grammar, or do you > code it manually? > > Usually it's a manual process of designing a grammar, and defining post-processing static semantics. Take a look again in the example language I gave above, it have couple of case of cover grammar. > > P.S.: there is actually option #3 -- to use "S-attributed" parser actions, >> it's when you may execute handling action not only at the end of >> production >> (i.e. not only on "reduce"), but after each element on RHS: >> > > I'm doubtful this would help. I wouldn't expect the LR-parser-generator to > 'understand' the content of actions, so it can't use them to alter the > generation of the parser, and so the LR automaton will have lots of > conflicts (because it can't 'separate' the alternatives that the > lookahead-restrictions are there to separate). > > Oh, it'll work for sure, depending on how powerful a parser generator is. Yes, technically it'll be a "reduce-reduce" conflict in the table, but parser will never enter this state based on the semantic action executed *before* entering some non-terminal (not after). In this "before" action you can communicated to lexer, and check the lookahead. Dmitry
_______________________________________________ es-discuss mailing list [email protected] https://mail.mozilla.org/listinfo/es-discuss

