David Starner wrote on 2001-05-11 17:01 UTC:
> On Fri, May 11, 2001 at 05:42:49PM +0100, Markus Kuhn wrote:
> > Roozbeh Pournader wrote on 2001-05-11 16:02 UTC:
> > > 2. Allowing 5 and 6-byte UTF-8, which Unicode 3.1 forbids.
> >
> > http://www.cl.cam.ac.uk/~mgk25/unicode.html#ucsutf
> >
> > Neither of these is a deviation of ISO 10646, which has a somewhat
> > broader scope than Unicode and is (at least in the context of
> > communication with ISO 6429 terminals) the preferred reference.
>
> Are you sure this isn't a deviation of ISO 10646? I thought they
> removed the 5 and 6-byte UTF-8 sequences in the latest stuff.
Not in ISO/IEC 10646-1:2000.
The rumours about UTF-8 being restricted to 4 bytes are just a Fear,
Uncertainty and Doubt strategy by the dark lords of the UTF-16 cult and
their 16-bit Win32 religion.
The private use groups at the far end of the 31-bit UCS are perfectly
good and useful in potential future schemes to guarantee say roundtrip
compatibility to various encodings with up to 2^29 code positions (full
ISO 2022/ISO IR, keysyms, etc.). There is not the slightest reason for
POSIX implementors to not support the full 6-byte version of UTF-8 as
defined in ISO 10646-1:2000.
Markus
--
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org, WWW: <http://www.cl.cam.ac.uk/~mgk25/>
-
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/lists/