On Aug 24, 2013, at 5:42 , Mathias Bynens <math...@qiwi.be> wrote:

> To clarify: consider what the Identifier Identification strawman[1] or any 
> scripts that emulate similar behavior should do if Allen’s suggestion would 
> be implemented:
> 
>    String.isIdentifierStart('\uD87E\uDC00'); // should be `false`
>    String.isIdentifierStart('\u{2F800}'); // should be `true`
>    // this is impossible, since `'\uD87E\uDC00' === '\u{2F800}'` and there is 
> no way to distinguish these strings
> 
> [1] http://wiki.ecmascript.org/doku.php?id=strawman:identifier_identification

On Aug 24, 2013, at 14:19 , Mathias Bynens <math...@qiwi.be> wrote:

> I just want to make sure it’s possible to write a polyfill (in ES5) for the 
> `String.isIdentifier{Start,Part}` strawman. As long as 
> `String.isIdentifierStart('\uD87E\uDC00')` and 
> `String.isIdentifierStart('\u{2F800}')` are expected to return different 
> results (as Allen suggests), this is impossible.

Allen didn't discuss these functions - the strawman didn't exist during the 
previous round of this discussion. Your code uses string literals, and in ES6 
string literals '\uD87E\uDC00' === '\u{2F800}'. This means the functions 
proposed in my Identifier Identification strawman cannot tell the difference, 
but then the specification doesn't require them to.

What Allen suggested, and the current ES6 spec says, is that identifiers in 
source text using different Unicode escape forms behave differently: 
   var \uD87E\uDC00;
throws an exception, while
   var \u{2F800};
declares a variable.

I don't think that's a technical problem. String.isIdentifier{Start,Part}, as I 
proposed them, don't deal with actual identifiers in source text; they check 
individual identifier characters. The functions are intended to be called by a 
parser, and it's up to the parser to deal with escaping rules, throwing 
exceptions or unescaping as specified before passing code points to 
String.isIdentifier{Start,Part}. Calling the functions with string literals 
doesn't seem like a useful use case.

I do think it's a problem in learning and understanding the language. Having 
different rules for \uD87E\uDC00 in string literals and identifiers, and 
therefore also for identifiers embedded in strings passed to eval(), adds yet 
another of those random inconsistencies that already litter ECMAScript, and 
ensures a "wat" moment for everybody who comes across them.

On a side note, the strawman hasn't been discussed by TC39 and hasn't been 
accepted for either ES6 or ES7, so it may be a bit premature to polyfill it. 
Informal feedback from some members indicated that they'd rather discuss it in 
the context of a complete proposal for Unicode character properties support.

Norbert
_______________________________________________
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Reply via email to