> The current 16-bit character strings are sometimes uses to store non-Unicode > binary data and can be used with non-Unicode character encoding with up to > 16-bit chars. 21 bits is sufficient for Unicode but perhaps is not enough > for other useful encodings. 32-bit seems like a plausable unit.
How would an eight-digit \u escape sequence work from an implementation standpoint? I'm assuming most implementations right now use 16-bit unsigned values as the individual elements of a String. If we allow arbitrary 32-bit values to be placed into a String, how would you make that work? There seem to only be a few options: a) Change the implementation to use 32-bit units. b) Change the implementation to use either 32-bit units as needed, with some sort of internal flag that specifies the unit size for an individual string. c) Encode the 32-bit values somehow as a sequence of 16-bit values. If you want to allow full generality, it seems like you'd be stuck with option a or option b. Is there really enough value in doing this? If, on the other hand, the idea is just to make it easier to include non-BMP Unicode characters in strings, you can accomplish this by making a long \u sequence just be shorthand for the equivalent sequence in UTF-16: \u10ffff would be exactly equivalent to \udbff\udfff. You don't have to change the internal format of the string, the indexes of individual characters stay the same, etc. --Rich Gillam Lab126 _______________________________________________ es-discuss mailing list [email protected] https://mail.mozilla.org/listinfo/es-discuss

