On 05/31/11 13:34, Brendan Eich wrote:
On May 31, 2011, at 1:21 PM, Waldemar Horwat wrote:
On 05/27/11 19:36, Brendan Eich wrote:
On May 27, 2011, at 6:42 PM, Brendan Eich wrote:
On May 27, 2011, at 6:20 PM, Waldemar Horwat<[email protected]> wrote:
This produces expressions such as 42 = foo(), which must be handled by semantic
specification. Why can't we have a more precise grammar?
This is an entirely different issue. The LeftHandSideExpression is still
evaluated as an expression; you just don't call GetValue on it. We chose to
prohibit 42 = foo(); we could equally well have chosen to prohibit foo = 42(),
but neither situation has much to do with the grammar.
That dodges the big "cover grammar" vs. Precise Grammar issue before us. It
assumes the conclusion tha semantics such as Reference internal types are the way to go,
because LR(1) can't hack it.
Again, real browser JS engines use top-down parsing and no Reference type,
instead specializing the AST for LeftHandSideExpressions to be precise, with
early errors per ES5 (and even in ES3 era) for nonsense-LHS-expressions.
Top-down LL parsers are subsumed by LR parsers. The hierarchy is:
LL(0)< LL(1)< LR(1). LR(0)< SLR(1)< LALR(1)< LR(1).
Yes, I know -- but that is not in practice so helpful to implementors, since
they have to (at least) tediously refactor to remove left recursion.. The
reason to use LR(1) in the spec is not to help implementors, as far as I can
tell.
The reason to use LR(1) that you have cited is to have a validated-unambiguous
grammar using a well-known formalism. It's a good reason, but it applies to
other approaches.
Is there anything particularly compelling about LR(1) vs. say PEG (which is
unambiguous by construction) -- linear time, memory proportional to PDA depth,
or other? I have my own thoughts but I'd be interested in yours.
Yes. I would not want to use anything like a PEG to standardize a grammar.
Here's why:
PEG being unambiguous by construction simply means that it resolves all
ambiguities by picking the earliest rule. This turns all rules following the
first one into negative rules: X matches Z only if it DOESN'T match a Y or a Q
or a B or .... You could pick the same strategy to disambiguate an LR(1)
grammar, and it would be equally bad.
Negative rules are the bane of grammars and behind the majority of the problems
with the C++ grammar, including the examples I listed earlier. They make a
grammar non-understandable because the order of the rules is subtly significant
and makes it hard to reason about when an X matches a Z; a language extension
might expand the definition of Y to make an X no longer match a Q, and you
wouldn't know it just by looking at a grammar with negative rules. In a
positive-rule-only grammar you'd discover the problem right away because the
grammar wouldn't compile.
Negative rules also interact badly with both semicolon insertion and
division-vs-regexp lexer disambiguation. One might naively think that
semicolon insertion would be an ideal match for negative rules: You first try
to parse
tokens-on-line1
tokens-on-line2
as a single statement and, only if that fails, you move on to parsing it as two
statements with a virtual semicolon between them. That, however, doesn't work.
Here's a simple counterexample:
a + b
(c) = d
Negative rules would insert a virtual semicolon here because
a + b(c) = d
is not a valid parse. However, the correct ECMAScript behavior is not to
insert a semicolon.
Waldemar
_______________________________________________
es-discuss mailing list
[email protected]
https://mail.mozilla.org/listinfo/es-discuss