On May 7, 2011, at 2:39 PM, Claus Reinke wrote:

>> Consider this: w = (x)->y || z
>> That code is not obvious at all.  Which of these would it be?
>> 1: w = function (x) { return y } || z
>> 2: w = function (x) { return y || z }
>> It seems to me that there must be some sort of delineation around the 
>> function start and end.  

Either that, or around the function body (expression or statement).


> But such delineation does not have to lead to required syntax
> in the function construct. Such required delineations were not
> part of lambda-calculus originally **, they arose out of implementation 
> limitations that can be overcome.

We're talking JS syntax here, not anything like the original Lambda calculus or 
s-expressions.


> I know that takes some getting used to (been there myself*), but there is an 
> established standard, both in publications on
> programming language theory, and in some real languages: 
>   function bodies extend as far as possible **
> 
> So your examples would be
> 
> 1: w = (function (x) return y) || z
> 2: w = (function (x) return y || z)

This does not work, because the ECMA-262 grammar simplifies to something like 
this toy bison grammar:

%token FUNCTION
%token NUMBER

%%

Expr: Expr '+' Term
    | Term
    ;

Term: Term '*' Fact
    | Fact '(' ')'
    | Fact
    ;

Fact: FUNCTION '(' ')' '{' Expr '}'
    | '(' Expr ')'
    | NUMBER
    ;

Notice this grammar parses function expressions with empty parameter lists, and 
function calls with empty argument lists. No statements, only expressions, and 
only numbers with addition and multiplication (and numbers are syntactically 
callable, as in JS), but the essential grammatical structure remains.

Now let's try to extend it to allow an unbraced body for the analogue of ES5's 
PrimaryExpression, here called Fact:

%token FUNCTION
%token NUMBER

%%

Expr: Expr '+' Term
    | Term
    ;

Term: Term '*' Fact
    | Fact '(' ')'
    | Fact
    ;

Fact: FUNCTION '(' ')' Fact
    | '(' FUNCTION '(' ')' Expr ')'
    | '(' Expr ')'
    | NUMBER
    ;

That is, we allow an unparenthesized high-precedence expression (a Fact) as the 
body of a function expression, or else require parentheses around an entire 
function expression whose body is an unparenthesized low-precedence expression 
(an Expr).

The results in reduce/reduce conflicts:

state 23

    4 Term: Fact . '(' ')'
    5     | Fact .
    6 Fact: FUNCTION '(' ')' Fact .

    '('  shift, and go to state 13

    '+'       reduce using rule 5 (Term)
    '+'       [reduce using rule 6 (Fact)]
    '*'       reduce using rule 5 (Term)
    '*'       [reduce using rule 6 (Fact)]
    '('       [reduce using rule 6 (Fact)]
    ')'       reduce using rule 5 (Term)
    ')'       [reduce using rule 6 (Fact)]
    $default  reduce using rule 5 (Term)

In other words, we've created a hopeless ambiguity between an unparenthesized 
function expression with a high-precedence body expression and a 
parenthesized-on-the-outside function expression with a low-precedence body.

There's a shift/reduce conflict too, but that is not the concern here. The 
problem is that no amount of lookahead can decide by which production to 
reduce. We must give up one of the two function expression forms.

One might be tempted to make an unparenthesized (on the outside) function 
expression with an unparenthesized expression body be a low-precedence 
expression (say, AssignmentExpression), but then it can't be used as the callee 
part of a CallExpression without being parenthesized. Such a low-precedence 
function expression would have to be parenthesized with every operator save 
comma.

In practice, function expressions aren't operated on except by being on the 
right-hand side of assignment, by calling them to make a CallExpression, and by 
concatenating their string conversions, but requiring parens for all 
sub-expression uses of function expressions is a big break from ECMA-262's 
grammar since ES3, which introduced function expressions as PrimaryExpressions 
(highest precedence).

Here is a more complete toy ES sub-grammar, still not too big:

%token FUNCTION
%token IDENT
%token NUMBER
%token PLUSPLUS

%%

Expr: Expr '+' Term
    | Term
    ;

Term: Term '*' Fact
    | Fact
    ;

Fact: PLUSPLUS Fact
    | Post
    ;

Post: Call PLUSPLUS
    | Call
    ;

Call: Call '(' ')'
    | Prim
    ;

Prim: FUNCTION '(' ')' '{' Expr '}'
    | '(' Expr ')'
    | IDENT
    | NUMBER
    ;

If we make the mistake ES4 made (implemented in SpiderMonkey still) of adding a 
Prim(aryExpression) right-hand side:

Prim: FUNCTION '(' ')' Expr

to allow unbraced expression bodies for function expressions, the 
low-precedence Expr at the end of high-precedence Prim makes a nasty loop in 
the grammar. Waldemar pointed this out a while ago 
(http://www.mail-archive.com/[email protected]/msg01008.html). His 
counter-example used a "let" expression, but there's a function expression 
equivalent:

  function () a ? f : x++();

Per the full ES grammar, and the toy just above, you cannot call a postfix-++ 
expression. But the whole of |function () a ? f : x++| should be a function 
expression with an unparenthesized Expr body, per the added right-hand side. 
And a function expression should be callable without having to be 
parenthesized, since it is a Prim(aryExpression).

This pretty much dooms arrow syntax to either requiring parentheses or braces 
around the arrow-function's body, always (one or the other; there's a separate 
issue with object initialisers Doug raised, I'll deal with it in a separate 
message); or else require parentheses or some other bracketing (as Isaac 
pointed out re: Ruby-ish blocks) on the outside. No way to be paren-free.

BTW, "maximal munch" refers to lexing, a DFA thing. Not to parsing regular 
languages, which requires a stack to decide how to recognize sentences in the 
language based on their prefixes and one or more tokens of lookahead.

Anyway, even though JS's grammar does not allow us to reason about function 
syntax as if we had the lambda calculus, I appreciated your attempt to deal 
with the cases Isaac came up with. I wish we could parse a low-precedence 
unparenthesized function body expression. It does not look feasible.

/be
_______________________________________________
es-discuss mailing list
[email protected]
https://mail.mozilla.org/listinfo/es-discuss

Reply via email to