"Peter S. Housel" wrote: > > o Complete disdain for ISO-10646 being 32 bits, when 16 > > of them are never anything but 0, and were put there just > > so that people could grep -v other people's languages out > > of documents > > > > o I'll believe Hieroglyphics and Linear B when I see the > > fonts and the programs that use them. Dead languages > > pretty much justify purpose-built linguistics software > > anyway. > > If you were a MathML user, or had a Chinese name using an obscure character, > you would probably feel differently.
Why? Have the Chinese sent representatives to an international standards body to get code pages other than 0 filled in with these characters? Have the MathML users? Basically, it's not necessary to have bits to represent these code points until they are parts of a standard character set. The entire point of Unicode was to provide round-trip capability between character sets. For MathML, you can actually unify the code points with Zapf or other characters thatdon't exist simultaneously in any character sets. Alrternately, you could use a "private use" area. > > o A desire for raw storage of Unicode, rather than UTF-8 or > > UTF-7 encoding. This last one is: > > You still need at least 21 bits to have "raw storage of Unicode". With > anything less, either UTF-16 surrogates or UTF-8 multi-byte encodings have > to be used. With a 16-bit wchar_t, even if I personally don't have any text > that uses characters beyond the BMP, I still have to write my code to > account for surrogates. Unicode 3.2.0 is not an ISO/IEC standard. It's a political thing. You might have an argument for ISO-10646-2:2001; however "Klingon" is not a script I'm really worried about. 8-). > > o People might accept doubling data size for the benefit > > of internationalization. They aren't going to accept > > a random multiplier between 1 and 5. > > I suspect UTF-16 doesn't compress very well using standard tools, and it is > subject to byte-order difficulties. (That goes double for UTF-32, of > course.) wchar_t probably shouldn't be directly used for storage. Anything larger than a byte has byte order problems; that was one of the original rationales for UTF-8 encoding. -- Terry To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message