On 05/27/11 16:00, Brendan Eich wrote:
On May 27, 2011, at 12:27 PM, Waldemar Horwat wrote:
Peter Hallam kindly offered to help come up with a new grammar formalism for the spec
that can pass the "Waldemar test" (if that is possible; not as hard as the
Turing test). IIRC Peter said he was (had, would) adding arrow support per the strawman
to Traceur (http://code.google.com/p/traceur-compiler/). We talked about Narcissus
support too, to get more user testing.
If we need to come up with a new formalism, that's a very powerful signal that
there's something seriously flawed in the design.
Or the spec.
LR(1) is good, I like it, but all the browser JS implementations, and Rhino,
use top-down hand-crafted parsers, even though JS is not LL(1). That is a big
disconnect between spec and reality.
As you've shown these can look good but be future hostile or downright buggy,
so we need a formalism that permits mechanical checking for ambiguities. We
don't want two ways to parse a sentence in the language.
But this does not mean we must stick with LR(1).
Even if it happens to work now, it will produce surprises down the road as we try to
extend the expression or parameter grammar. The places where the grammar is not LR(1) up
in C++ are some of the most frustrating and surprising ones for users to deal with, and
C++ does not even have the feedback from the parser to the lexer. Perl does and its
grammar is both ambiguous and undecidable as a result. Note that implementations of Perl
exist, which in this case simply means that the documented Perl "spec" is not
sound or faithful -- all implementations are in fact taking shortcuts not reflected in
the spec.
The problem is we are already cheating.
/AssignmentExpression/ :
/ConditionalExpression/
/LeftHandSideExpression/ = /AssignmentExpression/
/LeftHandSideExpression/ /AssignmentOperator/ /AssignmentExpression/
This produces expressions such as 42 = foo(), which must be handled by semantic
specification. Why can't we have a more precise grammar?
This is an entirely different issue. The LeftHandSideExpression is still
evaluated as an expression; you just don't call GetValue on it. We chose to
prohibit 42 = foo(); we could equally well have chosen to prohibit foo = 42(),
but neither situation has much to do with the grammar.
Building on this, destructuring assignment parses more of what was formerly
rejected by semantic checking: {p: q} = o destructures o.p into q (which must
be declared in Harmony -- it is an error if no such q was declared in scope).
We can certainly write semantic rules for destructuring to validate the object
literal as an object pattern; ditto arrays. But the LR(1) grammar is not by
itself valid specifying sentences in the language, just as it did not all these
years for assignment expressions.
Now, for arrow functions (you already know this, just reciting for the
es-discuss list) we could parse the /ArrowFormalParameters/ : /Expression/ and
then write semantics to validate that comma expression as arrow function formal
parameters.
Right now, the expression grammar and the formal parameter list grammar are
"close". They have already diverged in Harmony due to rest and spread not being
lookalikes: spread (http://wiki.ecmascript.org/doku.php?id=harmony:spread) allows ...
/AssignmentExpression/ while rest wants only ... /Identifier/.
But we still can cope: the /Expression/ grammar is a cover grammar for
/FormalParameterList/.
Of course, the two sub-grammars may diverge in a way we can't parse via parsing
a comma expression within the parentheses that come before the arrow. Guards
seem like they will cause the parameter syntax to diverge, unless you can use
them in expressions (not in the strawman).
The conclusion I draw from these challenges, some already dealt with
non-grammatically by ES1-5, is that we should not make a sacred cow out of
LR(1). We should be open to a formalism that is as checkable for ambiguities,
and that can cope with the C heritage we already have (assignment expressions),
as well as new syntax.
Given that LR(1) is the most general grammar available before you start getting
into serious complexity (it subsumes LALR and other commonly studied grammars),
there is a big cliff here and I think it's foolish to plan to jump off it
without completely understanding the consequences. This is especially true
because there are other paths available for compact function syntax that do not
involve jumping off that cliff.
I realize that C++ and Perl put up with ambiguity, and it seriously bites them.
Quick, what's the difference between the following in C++?
int x(int());
int x(-int());
Waldemar
_______________________________________________
es-discuss mailing list
[email protected]
https://mail.mozilla.org/listinfo/es-discuss