David Starner wrote on 2001-05-11 17:01 UTC:
> On Fri, May 11, 2001 at 05:42:49PM +0100, Markus Kuhn wrote:
> > Roozbeh Pournader wrote on 2001-05-11 16:02 UTC:
> > > 2. Allowing 5 and 6-byte UTF-8, which Unicode 3.1 forbids.
> > 
> > http://www.cl.cam.ac.uk/~mgk25/unicode.html#ucsutf
> > 
> > Neither of these is a deviation of ISO 10646, which has a somewhat
> > broader scope than Unicode and is (at least in the context of
> > communication with ISO 6429 terminals) the preferred reference.
> 
> Are you sure this isn't a deviation of ISO 10646? I thought they
> removed the 5 and 6-byte UTF-8 sequences in the latest stuff.

Not in ISO/IEC 10646-1:2000.

The rumours about UTF-8 being restricted to 4 bytes are just a Fear,
Uncertainty and Doubt strategy by the dark lords of the UTF-16 cult and
their 16-bit Win32 religion.

The private use groups at the far end of the 31-bit UCS are perfectly
good and useful in potential future schemes to guarantee say roundtrip
compatibility to various encodings with up to 2^29 code positions (full
ISO 2022/ISO IR, keysyms, etc.). There is not the slightest reason for
POSIX implementors to not support the full 6-byte version of UTF-8 as
defined in ISO 10646-1:2000.

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: <http://www.cl.cam.ac.uk/~mgk25/>

-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/lists/

Reply via email to