Allen Wirfs-Brock wrote:
On Feb 19, 2012, at 2:15 PM, Brendan Eich wrote:
I'm not a Unicode expert but I believe the latter is called "character".

Me neither, but I believe the correct term is "code point" which refers to the full 21-bit code while 
"Unicode character" is the logical entity corresponding to that code point.   That usage of 
"character" is difference from the current usage within ECMAScript where "character" is what we 
call the elements of the vector of 16-bit number that are used to represent a String value.   You can access then as 
string values of length 1 via [ ] or as numeric values via the charCodeAt method.

Thanks. We have a confusing transposition of terms between Unicode and ECMA-262, it seems. Should we fix?

JS must keep the "\uXXXX" notation for uint16 storage units, and one can create 
invalid Unicode strings already. This hazard does not go away, we keep compatibility, but 
the BRS adds no new hazards and in practice, if well-used, should reduce the incidence of 
invalid-Unicode-string bugs.

The "\u{...}" notation is independent and should work whatever the BRS setting, IMHO. In "UCS-2" 
(default) setting, "\u{...}" can make pairs. In "UTF-16" setting, it makes only characters. And of 
course in the latter case indexing and length count characters.

I think your names for the BRS modes are misleading.

You got me, in fact I used "full Unicode" for the BRS-thrown setting elsewhere.

My implementor's bias is showing, because I expect many engines would use UTF-16 internally and have non-O(1) indexing for strings with the contains-non-BMP-and-BRS-set-to-full-Unicode flag bit.

/be
_______________________________________________
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Reply via email to