On May 27, 2011, at 12:27 PM, Waldemar Horwat wrote:

>> Peter Hallam kindly offered to help come up with a new grammar formalism for 
>> the spec that can pass the "Waldemar test" (if that is possible; not as hard 
>> as the Turing test). IIRC Peter said he was (had, would) adding arrow 
>> support per the strawman to Traceur 
>> (http://code.google.com/p/traceur-compiler/). We talked about Narcissus 
>> support too, to get more user testing.
> 
> If we need to come up with a new formalism, that's a very powerful signal 
> that there's something seriously flawed in the design.

Or the spec.

LR(1) is good, I like it, but all the browser JS implementations, and Rhino, 
use top-down hand-crafted parsers, even though JS is not LL(1). That is a big 
disconnect between spec and reality.

As you've shown these can look good but be future hostile or downright buggy, 
so we need a formalism that permits mechanical checking for ambiguities. We 
don't want two ways to parse a sentence in the language.

But this does not mean we must stick with LR(1).


>  Even if it happens to work now, it will produce surprises down the road as 
> we try to extend the expression or parameter grammar.  The places where the 
> grammar is not LR(1) up in C++ are some of the most frustrating and 
> surprising ones for users to deal with, and C++ does not even have the 
> feedback from the parser to the lexer.  Perl does and its grammar is both 
> ambiguous and undecidable as a result.  Note that implementations of Perl 
> exist, which in this case simply means that the documented Perl "spec" is not 
> sound or faithful -- all implementations are in fact taking shortcuts not 
> reflected in the spec.

The problem is we are already cheating.

AssignmentExpression :
    ConditionalExpression
    LeftHandSideExpression = AssignmentExpression
    LeftHandSideExpression AssignmentOperator AssignmentExpression

This produces expressions such as 42 = foo(), which must be handled by semantic 
specification. Why can't we have a more precise grammar?

Building on this, destructuring assignment parses more of what was formerly 
rejected by semantic checking: {p: q} = o destructures o.p into q (which must 
be declared in Harmony -- it is an error if no such q was declared in scope).

We can certainly write semantic rules for destructuring to validate the object 
literal as an object pattern; ditto arrays. But the LR(1) grammar is not by 
itself valid specifying sentences in the language, just as it did not all these 
years for assignment expressions.

Now, for arrow functions (you already know this, just reciting for the 
es-discuss list) we could parse the ArrowFormalParameters : Expression and then 
write semantics to validate that comma expression as arrow function formal 
parameters.

Right now, the expression grammar and the formal parameter list grammar are 
"close". They have already diverged in Harmony due to rest and spread not being 
lookalikes: spread (http://wiki.ecmascript.org/doku.php?id=harmony:spread) 
allows ... AssignmentExpression while rest wants only  ... Identifier.

But we still can cope: the Expression grammar is a cover grammar for 
FormalParameterList.

Of course, the two sub-grammars may diverge in a way we can't parse via parsing 
a comma expression within the parentheses that come before the arrow. Guards 
seem like they will cause the parameter syntax to diverge, unless you can use 
them in expressions (not in the strawman).

The conclusion I draw from these challenges, some already dealt with 
non-grammatically by ES1-5, is that we should not make a sacred cow out of 
LR(1). We should be open to a formalism that is as checkable for ambiguities, 
and that can cope with the C heritage we already have (assignment expressions), 
as well as new syntax.

/be

_______________________________________________
es-discuss mailing list
[email protected]
https://mail.mozilla.org/listinfo/es-discuss

Reply via email to