On 05/27/11 16:00, Brendan Eich wrote:
On May 27, 2011, at 12:27 PM, Waldemar Horwat wrote:

Peter Hallam kindly offered to help come up with a new grammar formalism for the spec 
that can pass the "Waldemar test" (if that is possible; not as hard as the 
Turing test). IIRC Peter said he was (had, would) adding arrow support per the strawman 
to Traceur (http://code.google.com/p/traceur-compiler/). We talked about Narcissus 
support too, to get more user testing.

If we need to come up with a new formalism, that's a very powerful signal that 
there's something seriously flawed in the design.

Or the spec.

LR(1) is good, I like it, but all the browser JS implementations, and Rhino, 
use top-down hand-crafted parsers, even though JS is not LL(1). That is a big 
disconnect between spec and reality.

As you've shown these can look good but be future hostile or downright buggy, 
so we need a formalism that permits mechanical checking for ambiguities. We 
don't want two ways to parse a sentence in the language.

But this does not mean we must stick with LR(1).


Even if it happens to work now, it will produce surprises down the road as we try to 
extend the expression or parameter grammar. The places where the grammar is not LR(1) up 
in C++ are some of the most frustrating and surprising ones for users to deal with, and 
C++ does not even have the feedback from the parser to the lexer. Perl does and its 
grammar is both ambiguous and undecidable as a result. Note that implementations of Perl 
exist, which in this case simply means that the documented Perl "spec" is not 
sound or faithful -- all implementations are in fact taking shortcuts not reflected in 
the spec.

The problem is we are already cheating.

/AssignmentExpression/ :
/ConditionalExpression/
/LeftHandSideExpression/ = /AssignmentExpression/
/LeftHandSideExpression/ /AssignmentOperator/ /AssignmentExpression/

This produces expressions such as 42 = foo(), which must be handled by semantic 
specification. Why can't we have a more precise grammar?

This is an entirely different issue.  The LeftHandSideExpression is still 
evaluated as an expression; you just don't call GetValue on it.  We chose to 
prohibit 42 = foo(); we could equally well have chosen to prohibit foo = 42(), 
but neither situation has much to do with the grammar.

Building on this, destructuring assignment parses more of what was formerly 
rejected by semantic checking: {p: q} = o destructures o.p into q (which must 
be declared in Harmony -- it is an error if no such q was declared in scope).

We can certainly write semantic rules for destructuring to validate the object 
literal as an object pattern; ditto arrays. But the LR(1) grammar is not by 
itself valid specifying sentences in the language, just as it did not all these 
years for assignment expressions.

Now, for arrow functions (you already know this, just reciting for the 
es-discuss list) we could parse the /ArrowFormalParameters/ : /Expression/ and 
then write semantics to validate that comma expression as arrow function formal 
parameters.

Right now, the expression grammar and the formal parameter list grammar are 
"close". They have already diverged in Harmony due to rest and spread not being 
lookalikes: spread (http://wiki.ecmascript.org/doku.php?id=harmony:spread) allows ... 
/AssignmentExpression/ while rest wants only ... /Identifier/.

But we still can cope: the /Expression/ grammar is a cover grammar for 
/FormalParameterList/.

Of course, the two sub-grammars may diverge in a way we can't parse via parsing 
a comma expression within the parentheses that come before the arrow. Guards 
seem like they will cause the parameter syntax to diverge, unless you can use 
them in expressions (not in the strawman).

The conclusion I draw from these challenges, some already dealt with 
non-grammatically by ES1-5, is that we should not make a sacred cow out of 
LR(1). We should be open to a formalism that is as checkable for ambiguities, 
and that can cope with the C heritage we already have (assignment expressions), 
as well as new syntax.

Given that LR(1) is the most general grammar available before you start getting 
into serious complexity (it subsumes LALR and other commonly studied grammars), 
there is a big cliff here and I think it's foolish to plan to jump off it 
without completely understanding the consequences.  This is especially true 
because there are other paths available for compact function syntax that do not 
involve jumping off that cliff.
I realize that C++ and Perl put up with ambiguity, and it seriously bites them. 
 Quick, what's the difference between the following in C++?

  int x(int());
  int x(-int());

    Waldemar
_______________________________________________
es-discuss mailing list
[email protected]
https://mail.mozilla.org/listinfo/es-discuss

Reply via email to