On 05/16/11 11:11, Allen Wirfs-Brock wrote:
I tried to post a pointer to this strawman on this list a few weeks ago, but 
apparently it didn't reach the list for some reason.

Feed back would be appreciated:

http://wiki.ecmascript.org/doku.php?id=strawman:support_full_unicode_in_strings

Allen

Two different languages made different decisions on how to approach extending 
their character sets as Unicode evolved:

- Java kept their strings encoded exactly as they were (a sequence of 16-bit 
code units) and provided extra APIs for the cases where you want to extract a 
code point.

- Perl widened the concept of characters in strings away from bytes to full 
Unicode characters.  Thus a UTF-8 string can be either represented where each 
byte is one Perl character or where each Unicode character is one Perl 
character.  There are conversion functions provided to move between the two.

My experience is that Java's approach worked, while Perl's has led to an 
endless shop of horrors.  The problem is that different APIs expect different 
kinds of strings, so I'm still finding places where conversions should be added 
but weren't (or vice versa) in a lot of code years after it was written.

1. I would not be in favor of any approach that widens the concept of a string 
character or introduces two different representations for a non-BMP character.  
It will suffer from the same problems as Perl, except that they will be harder 
to find because use of non-BMP characters is relatively rare.

2. Widening characters to 21 bits doesn't really help much.  As stated earlier 
in this thread, you still want to treat clumps of combining characters together 
with the character to which they combine, worry about various normalized forms, 
etc.  All of these require the machinery to deal with clumps of code units as 
though they were single characters/graphemes/etc., and once you have that 
machinery you can reuse it to support non-BMP characters while keeping string 
code units 16 bits.

    Waldemar
_______________________________________________
es-discuss mailing list
[email protected]
https://mail.mozilla.org/listinfo/es-discuss

Reply via email to