Hi, ISO 10646 Amendment 1 has formally committed the the UCS-4 codespace ENDS with UTF-16 Plane 16.
See the attached verbatim quote from Unicode 3.2. ----------------------------- [verbatim excerpt from Unicode 3.2 - http://www.unicode.org/unicode/reports/tr28] VIII Relation to ISO/IEC 10646 ISO/IEC 10646 is a multi-part standard. Part 1, published as ISO/IEC 10646-1:2000(E), covers the Architecture and Basic Multilingual Plane. Part 2, published as ISO/IEC 10646-2:2001(E), covers the supplementary planes. Amendment 1 to Part 1 makes a few modifications to the architecture of 10646 and adds about a thousand characters to the BMP. Unicode 3.2 contains all of the characters of Amendment 1, including the two characters of Amendment 1 that had already been added to Unicode 3.1. With the publication of Amendment 1 to ISO/IEC 10646-1:2000 and the Unicode Standard, Version 3.2, the two standards are fully synchronized. The Unicode Consortium and ISO/IEC JTC1/SC2/WG2 are committed to maintaining the synchronization between the two standards. Notable among the architectural changes to ISO/IEC 10646 approved in Amendment 1 are: The range of characters available for private use has been restricted to those characters accessible via UTF-16, and the intent not to encode characters past Plane 16 has been clarified. This guarantees the interoperability of UTF-8 and UTF-16, and the equivalence of UTF-32 and UCS-4. The definition of UCS short identifiers has been modified and UCS sequence identifiers have been added. This brings 10646 in line with Unicode conventions for representing characters and sequences of characters. The clause reserving characters for internal use has been updated, so that the 10646 specification is in line with the Unicode specification of noncharacters, including the noncharacters at U+FDD0..U+FDEF. -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] Sent: Monday, September 23, 2002 7:47 PM To: [EMAIL PROTECTED]; [EMAIL PROTECTED] Subject: Re: Linux and UTF8 filenames > From: [EMAIL PROTECTED] > >From: Edward Cherlin <[EMAIL PROTECTED]> > >The internal coding used by any software is totally irrelevant to any > >other software, or to users. UTF-16 stores BMP CJK characters in two > >bytes each, whereas UTF-8 requires three. This saves some space in a > >number of tables. It isn't a big deal, but it is a very reasonable > >design choice. > > If the unicode standard is extended beyond 0x10FFFF It won't be extended beyond 0x10FFFF for sure, unless all of the current Unicode Technicall Committee(UTC) voting members are replaced with pro-beyond-0x10FFFF-assignment people ;-). > utf-16 is unsuitable for protocols imo I do not necessarily disagree with your opinion, and I do not necessarily even recommend the use of UTF-16, nor even not necessarily recommend hardwiring to UTF-8/16/32 to people in general. For me, surrogate support is not a big deal, and the other Unicode complexities are not too bad to deal with, so I chose to go with the UTF-16 hardwiring approach for my project. ;-) -- hiura@{freestandards.org,li18nux.org,unicode.org,sun.com} Chair, Li18nux/Linux Internationalization Initiative, http://www.li18nux.org Board of Directors, Free Standards Group, http://www.freestandards.org Architect/Sr. Staff Engineer, Sun Microsystems, Inc, USA eFAX: 509-693-8356 -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/ -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
