Bruce Thomson <[EMAIL PROTECTED]> wrote: > But to conserve file space, it would probably be best to allow > intermixing of 128-bit characters with ASCI text. UTF-8 continues > to be the way to do this, since it just a compression scheme that > does not really depend on the fact that Unicode is currently > limited to 32 bits. It could just as easily be extended to work > with much larger character sets.
This is not even close to true. UTF-8 is very much dependent on the 32-bit architecture of Unicode, and in fact is constrained to 31-bit code points. A quick check of the "10xxxxxx 10xxxxxx..." chart in RFC 2279, or in the Unicode Standard or ISO/IEC 10646, will confirm that. And the word "currently," as used to refer to either the 21-bit or the 32-bit limit of Unicode/10646, is being used way too cavalierly. Unicode is not going to be expanded beyond U+10FFFD, and nobody can think of a non-whimsical reason why it should be. -Doug Ewell Fullerton, California
