Re: Linux console UTF-8 by default

Markus Kuhn Sat, 17 Jan 2004 11:20:25 -0800

Roozbeh Pournader wrote on 2004-01-11 14:15 UTC:
> On Sat, 2004-01-10 at 23:51, Edward H. Trager wrote:
> > I guess I was recalling (from http://www.cl.cam.ac.uk/~mgk25/unicode.html) 
> > that six bytes allows encoding all possible 
> > 2^31 UCS code points, although
> > I suppose nothing above plane 1 has been defined.
> 
> 1. That page is a little out of date (although a wonderful resource).


I don't think there is anything out of date:

  "The definitions of UTF-8 in UCS and Unicode differed originally
  slightly, because in UCS, up to 6-byte long UTF-8 sequences were
  possible to represent characters up to U-7FFFFFFF, while in Unicode only
  up to 4-byte long UTF-8 sequences are defined to represent characters up
  to U-0010FFFF."

The 21-bit limit is definitely described after the reader first gets an
introduction to UTF-8 that reflects its original ISO definition.

Markus

-- 
Markus Kuhn, Computer Lab, Univ of Cambridge, GB
http://www.cl.cam.ac.uk/~mgk25/ | __oo_O..O_oo__


--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Re: Linux console UTF-8 by default

Reply via email to