RE: Linux and UTF8 filenames [0x10FFFF UCS limit]

McDonald, Ira Tue, 24 Sep 2002 13:30:03 -0700

Hi,

ISO 10646 Amendment 1 has formally committed the the UCS-4 codespace 
ENDS with UTF-16 Plane 16.


See the attached verbatim quote from Unicode 3.2.

-----------------------------
[verbatim excerpt from Unicode 3.2 
- http://www.unicode.org/unicode/reports/tr28]

VIII Relation to ISO/IEC 10646

ISO/IEC 10646 is a multi-part standard. Part 1, published as ISO/IEC
10646-1:2000(E), covers the Architecture and Basic Multilingual Plane. Part
2, published as ISO/IEC 10646-2:2001(E), covers the supplementary planes.
Amendment 1 to Part 1 makes a few modifications to the architecture of 10646
and adds about a thousand characters to the BMP. 

Unicode 3.2 contains all of the characters of Amendment 1, including the two
characters of Amendment 1 that had already been added to Unicode 3.1. With
the publication of Amendment 1 to ISO/IEC 10646-1:2000 and the Unicode
Standard, Version 3.2, the two standards are fully synchronized. 

The Unicode Consortium and ISO/IEC JTC1/SC2/WG2 are committed to maintaining
the synchronization between the two standards. 

Notable among the architectural changes to ISO/IEC 10646 approved in
Amendment 1 are: 

The range of characters available for private use has been restricted to
those characters accessible via UTF-16, and the intent not to encode
characters past Plane 16 has been clarified. This guarantees the
interoperability of UTF-8 and UTF-16, and the equivalence of UTF-32 and
UCS-4. 

The definition of UCS short identifiers has been modified and UCS sequence
identifiers have been added. This brings 10646 in line with Unicode
conventions for representing characters and sequences of characters.  

The clause reserving characters for internal use has been updated, so that
the 10646 specification is in line with the Unicode specification of
noncharacters, including the noncharacters at U+FDD0..U+FDEF.

-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]
Sent: Monday, September 23, 2002 7:47 PM
To: [EMAIL PROTECTED]; [EMAIL PROTECTED]
Subject: Re: Linux and UTF8 filenames


> From: [EMAIL PROTECTED]
> >From: Edward Cherlin <[EMAIL PROTECTED]>
> >The internal coding used by any software is totally irrelevant to any 
> >other software, or to users. UTF-16 stores BMP CJK characters in two 
> >bytes each, whereas UTF-8 requires three. This saves some space in a 
> >number of tables. It isn't a big deal, but it is a very reasonable 
> >design choice.
>
> If the unicode standard is extended beyond 0x10FFFF

It won't be extended beyond 0x10FFFF for sure, unless all of the
current Unicode Technicall Committee(UTC) voting members are replaced
with pro-beyond-0x10FFFF-assignment people ;-).

> utf-16 is unsuitable for protocols imo

I do not necessarily disagree with your opinion, and I do not
necessarily even recommend the use of UTF-16, nor even not necessarily
recommend hardwiring to UTF-8/16/32 to people in general.

For me, surrogate support is not a big deal, and the other Unicode
complexities are not too bad to deal with, so I chose to go with the 
UTF-16 hardwiring approach for my project. ;-)

--
hiura@{freestandards.org,li18nux.org,unicode.org,sun.com} 
Chair, Li18nux/Linux Internationalization Initiative, http://www.li18nux.org
Board of Directors, Free Standards Group,       http://www.freestandards.org
Architect/Sr. Staff Engineer, Sun Microsystems, Inc, USA  eFAX: 509-693-8356


--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/
--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

RE: Linux and UTF8 filenames [0x10FFFF UCS limit]

Reply via email to