Re: May 24-26 rough meeting notes

Waldemar Horwat Tue, 31 May 2011 14:30:45 -0700

On 05/31/11 13:34, Brendan Eich wrote:

On May 31, 2011, at 1:21 PM, Waldemar Horwat wrote:

On 05/27/11 19:36, Brendan Eich wrote:

On May 27, 2011, at 6:42 PM, Brendan Eich wrote:

On May 27, 2011, at 6:20 PM, Waldemar Horwat<[email protected]>   wrote:

This produces expressions such as 42 = foo(), which must be handled by semantic 
specification. Why can't we have a more precise grammar?


This is an entirely different issue.  The LeftHandSideExpression is still 
evaluated as an expression; you just don't call GetValue on it.  We chose to 
prohibit 42 = foo(); we could equally well have chosen to prohibit foo = 42(), 
but neither situation has much to do with the grammar.


That dodges the big "cover grammar" vs. Precise Grammar issue before us. It 
assumes the conclusion tha semantics such as Reference internal types are the way to go, 
because LR(1) can't hack it.


Again, real browser JS engines use top-down parsing and no Reference type, 
instead specializing the AST for LeftHandSideExpressions to be precise, with 
early errors per ES5 (and even in ES3 era) for nonsense-LHS-expressions.


Top-down LL parsers are subsumed by LR parsers.  The hierarchy is:

LL(0)<  LL(1)<  LR(1). LR(0)<  SLR(1)<  LALR(1)<  LR(1).


Yes, I know -- but that is not in practice so helpful to implementors, since 
they have to (at least) tediously refactor to remove left recursion.. The 
reason to use LR(1) in the spec is not to help implementors, as far as I can 
tell.

The reason to use LR(1) that you have cited is to have a validated-unambiguous 
grammar using a well-known formalism. It's a good reason, but it applies to 
other approaches.

Is there anything particularly compelling about LR(1) vs. say PEG (which is 
unambiguous by construction) -- linear time, memory proportional to PDA depth, 
or other? I have my own thoughts but I'd be interested in yours.


Yes.  I would not want to use anything like a PEG to standardize a grammar.  
Here's why:

PEG being unambiguous by construction simply means that it resolves all 
ambiguities by picking the earliest rule.  This turns all rules following the 
first one into negative rules:  X matches Z only if it DOESN'T match a Y or a Q 
or a B or ....  You could pick the same strategy to disambiguate an LR(1) 
grammar, and it would be equally bad.

Negative rules are the bane of grammars and behind the majority of the problems 
with the C++ grammar, including the examples I listed earlier.  They make a 
grammar non-understandable because the order of the rules is subtly significant 
and makes it hard to reason about when an X matches a Z; a language extension 
might expand the definition of Y to make an X no longer match a Q, and you 
wouldn't know it just by looking at a grammar with negative rules.  In a 
positive-rule-only grammar you'd discover the problem right away because the 
grammar wouldn't compile.

Negative rules also interact badly with both semicolon insertion and 
division-vs-regexp lexer disambiguation.  One might naively think that 
semicolon insertion would be an ideal match for negative rules:  You first try 
to parse

  tokens-on-line1
  tokens-on-line2

as a single statement and, only if that fails, you move on to parsing it as two 
statements with a virtual semicolon between them.  That, however, doesn't work. 
 Here's a simple counterexample:

  a + b
  (c) = d

Negative rules would insert a virtual semicolon here because

  a + b(c) = d

is not a valid parse.  However, the correct ECMAScript behavior is not to 
insert a semicolon.

    Waldemar
_______________________________________________
es-discuss mailing list
[email protected]
https://mail.mozilla.org/listinfo/es-discuss

Re: May 24-26 rough meeting notes

Reply via email to