On May 7, 2011, at 2:39 PM, Claus Reinke wrote:
>> Consider this: w = (x)->y || z
>> That code is not obvious at all. Which of these would it be?
>> 1: w = function (x) { return y } || z
>> 2: w = function (x) { return y || z }
>> It seems to me that there must be some sort of delineation around the
>> function start and end.
Either that, or around the function body (expression or statement).
> But such delineation does not have to lead to required syntax
> in the function construct. Such required delineations were not
> part of lambda-calculus originally **, they arose out of implementation
> limitations that can be overcome.
We're talking JS syntax here, not anything like the original Lambda calculus or
s-expressions.
> I know that takes some getting used to (been there myself*), but there is an
> established standard, both in publications on
> programming language theory, and in some real languages:
> function bodies extend as far as possible **
>
> So your examples would be
>
> 1: w = (function (x) return y) || z
> 2: w = (function (x) return y || z)
This does not work, because the ECMA-262 grammar simplifies to something like
this toy bison grammar:
%token FUNCTION
%token NUMBER
%%
Expr: Expr '+' Term
| Term
;
Term: Term '*' Fact
| Fact '(' ')'
| Fact
;
Fact: FUNCTION '(' ')' '{' Expr '}'
| '(' Expr ')'
| NUMBER
;
Notice this grammar parses function expressions with empty parameter lists, and
function calls with empty argument lists. No statements, only expressions, and
only numbers with addition and multiplication (and numbers are syntactically
callable, as in JS), but the essential grammatical structure remains.
Now let's try to extend it to allow an unbraced body for the analogue of ES5's
PrimaryExpression, here called Fact:
%token FUNCTION
%token NUMBER
%%
Expr: Expr '+' Term
| Term
;
Term: Term '*' Fact
| Fact '(' ')'
| Fact
;
Fact: FUNCTION '(' ')' Fact
| '(' FUNCTION '(' ')' Expr ')'
| '(' Expr ')'
| NUMBER
;
That is, we allow an unparenthesized high-precedence expression (a Fact) as the
body of a function expression, or else require parentheses around an entire
function expression whose body is an unparenthesized low-precedence expression
(an Expr).
The results in reduce/reduce conflicts:
state 23
4 Term: Fact . '(' ')'
5 | Fact .
6 Fact: FUNCTION '(' ')' Fact .
'(' shift, and go to state 13
'+' reduce using rule 5 (Term)
'+' [reduce using rule 6 (Fact)]
'*' reduce using rule 5 (Term)
'*' [reduce using rule 6 (Fact)]
'(' [reduce using rule 6 (Fact)]
')' reduce using rule 5 (Term)
')' [reduce using rule 6 (Fact)]
$default reduce using rule 5 (Term)
In other words, we've created a hopeless ambiguity between an unparenthesized
function expression with a high-precedence body expression and a
parenthesized-on-the-outside function expression with a low-precedence body.
There's a shift/reduce conflict too, but that is not the concern here. The
problem is that no amount of lookahead can decide by which production to
reduce. We must give up one of the two function expression forms.
One might be tempted to make an unparenthesized (on the outside) function
expression with an unparenthesized expression body be a low-precedence
expression (say, AssignmentExpression), but then it can't be used as the callee
part of a CallExpression without being parenthesized. Such a low-precedence
function expression would have to be parenthesized with every operator save
comma.
In practice, function expressions aren't operated on except by being on the
right-hand side of assignment, by calling them to make a CallExpression, and by
concatenating their string conversions, but requiring parens for all
sub-expression uses of function expressions is a big break from ECMA-262's
grammar since ES3, which introduced function expressions as PrimaryExpressions
(highest precedence).
Here is a more complete toy ES sub-grammar, still not too big:
%token FUNCTION
%token IDENT
%token NUMBER
%token PLUSPLUS
%%
Expr: Expr '+' Term
| Term
;
Term: Term '*' Fact
| Fact
;
Fact: PLUSPLUS Fact
| Post
;
Post: Call PLUSPLUS
| Call
;
Call: Call '(' ')'
| Prim
;
Prim: FUNCTION '(' ')' '{' Expr '}'
| '(' Expr ')'
| IDENT
| NUMBER
;
If we make the mistake ES4 made (implemented in SpiderMonkey still) of adding a
Prim(aryExpression) right-hand side:
Prim: FUNCTION '(' ')' Expr
to allow unbraced expression bodies for function expressions, the
low-precedence Expr at the end of high-precedence Prim makes a nasty loop in
the grammar. Waldemar pointed this out a while ago
(http://www.mail-archive.com/[email protected]/msg01008.html). His
counter-example used a "let" expression, but there's a function expression
equivalent:
function () a ? f : x++();
Per the full ES grammar, and the toy just above, you cannot call a postfix-++
expression. But the whole of |function () a ? f : x++| should be a function
expression with an unparenthesized Expr body, per the added right-hand side.
And a function expression should be callable without having to be
parenthesized, since it is a Prim(aryExpression).
This pretty much dooms arrow syntax to either requiring parentheses or braces
around the arrow-function's body, always (one or the other; there's a separate
issue with object initialisers Doug raised, I'll deal with it in a separate
message); or else require parentheses or some other bracketing (as Isaac
pointed out re: Ruby-ish blocks) on the outside. No way to be paren-free.
BTW, "maximal munch" refers to lexing, a DFA thing. Not to parsing regular
languages, which requires a stack to decide how to recognize sentences in the
language based on their prefixes and one or more tokens of lookahead.
Anyway, even though JS's grammar does not allow us to reason about function
syntax as if we had the lambda calculus, I appreciated your attempt to deal
with the cases Isaac came up with. I wish we could parse a low-precedence
unparenthesized function body expression. It does not look feasible.
/be
_______________________________________________
es-discuss mailing list
[email protected]
https://mail.mozilla.org/listinfo/es-discuss