Roozbeh Pournader wrote on 2004-01-11 14:15 UTC: > On Sat, 2004-01-10 at 23:51, Edward H. Trager wrote: > > I guess I was recalling (from http://www.cl.cam.ac.uk/~mgk25/unicode.html) > > that six bytes allows encoding all possible > > 2^31 UCS code points, although > > I suppose nothing above plane 1 has been defined. > > 1. That page is a little out of date (although a wonderful resource).
I don't think there is anything out of date: "The definitions of UTF-8 in UCS and Unicode differed originally slightly, because in UCS, up to 6-byte long UTF-8 sequences were possible to represent characters up to U-7FFFFFFF, while in Unicode only up to 4-byte long UTF-8 sequences are defined to represent characters up to U-0010FFFF." The 21-bit limit is definitely described after the reader first gets an introduction to UTF-8 that reflects its original ISO definition. Markus -- Markus Kuhn, Computer Lab, Univ of Cambridge, GB http://www.cl.cam.ac.uk/~mgk25/ | __oo_O..O_oo__ -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
