On 05/16/11 11:11, Allen Wirfs-Brock wrote:
I tried to post a pointer to this strawman on this list a few weeks ago, but
apparently it didn't reach the list for some reason.
Feed back would be appreciated:
http://wiki.ecmascript.org/doku.php?id=strawman:support_full_unicode_in_strings
Allen
Two different languages made different decisions on how to approach extending
their character sets as Unicode evolved:
- Java kept their strings encoded exactly as they were (a sequence of 16-bit
code units) and provided extra APIs for the cases where you want to extract a
code point.
- Perl widened the concept of characters in strings away from bytes to full
Unicode characters. Thus a UTF-8 string can be either represented where each
byte is one Perl character or where each Unicode character is one Perl
character. There are conversion functions provided to move between the two.
My experience is that Java's approach worked, while Perl's has led to an
endless shop of horrors. The problem is that different APIs expect different
kinds of strings, so I'm still finding places where conversions should be added
but weren't (or vice versa) in a lot of code years after it was written.
1. I would not be in favor of any approach that widens the concept of a string
character or introduces two different representations for a non-BMP character.
It will suffer from the same problems as Perl, except that they will be harder
to find because use of non-BMP characters is relatively rare.
2. Widening characters to 21 bits doesn't really help much. As stated earlier
in this thread, you still want to treat clumps of combining characters together
with the character to which they combine, worry about various normalized forms,
etc. All of these require the machinery to deal with clumps of code units as
though they were single characters/graphemes/etc., and once you have that
machinery you can reuse it to support non-BMP characters while keeping string
code units 16 bits.
Waldemar
_______________________________________________
es-discuss mailing list
[email protected]
https://mail.mozilla.org/listinfo/es-discuss