Thanks for making a strawman

Unicode Escape Sequences
Is it possible for U+ to accept either 4, 5, or 6 digit sequences?   Typically 
when I encounter U+ notation the leading zero is omitted, and I see BMP 
characters quite often.  Obviously BMP could use the U notation, however it 
seems like it'd be annoying to the occasional user to know that U is used for 
some and U+ for others.  Seems like it'd be easier for developers to remember 
that U+ is "the new way" and U is "the old way that doesn't always work".

String Position
It's unclear to me if the string indices can be "changed" from UTF-16 to UTF-32 
positions.  Although UTF-32 indices are clearly desirable, I think that many 
implementations currently allow UTF-16 codepoints U+D800 through U+DFFF.  In 
other words, I can already have Javascript strings with full Unicode range data 
in them.  Existing applications would then have indices that pointed to the 
UTF-16, not UTF-32 index.  Changing the definition of the index to UTF-32 would 
break those applications I think.

You also touch on that with charCodeAt/codepointAt, which resolves the problem 
with the output type, but doesn't address the problem with the indexing.  
Similar to the way you differentiated charCode/codepoint, it may be necessary 
to differentiate charCode/codepoint indices.  IMO .fromCharCode doesn't have 
this problem since it used to fail, but now works, which wouldn't be breaking.  
Unless we're concerned that now it can return a different UTF-16 length than 
before.

I don't like the "21" in the name of decodeURI21.  Also, the "trick" I think, 
is encoding to surrogate pairs (illegally, since UTF8 doesn't allow that) vs 
decoding to UTF16.  It seems like decoding can safely detect input 
supplementary characters and properly decode them, or is there something about 
encoding that doesn't make that state detectable?

-Shawn

From: [email protected] [mailto:[email protected]] On 
Behalf Of Allen Wirfs-Brock
Sent: Monday, May 16, 2011 11:12 AM
To: [email protected]
Subject: Full Unicode strings strawman

I tried to post a pointer to this strawman on this list a few weeks ago, but 
apparently it didn't reach the list for some reason.

Feed back would be appreciated:

http://wiki.ecmascript.org/doku.php?id=strawman:support_full_unicode_in_strings

Allen
_______________________________________________
es-discuss mailing list
[email protected]
https://mail.mozilla.org/listinfo/es-discuss

Reply via email to