On Sun, Jan 11, 2004 at 05:45:19PM +0330, Roozbeh Pournader wrote: > On Sat, 2004-01-10 at 23:51, Edward H. Trager wrote: > > I guess I was recalling (from http://www.cl.cam.ac.uk/~mgk25/unicode.html) > > that six bytes allows encoding all possible > > 2^31 UCS code points, although > > I suppose nothing above plane 1 has been defined. > > 1. That page is a little out of date (although a wonderful resource). > > 2. Although UCS theoretically allows 2^31 code points, it will never > encode any character higher than U+10FFFF.
Well, you can never tell. I know that Sc2/WG2 has said that they will never allocate something above the 21th bit, but then again they said they would never reallocate characters, and then they did it anyway. I would say: "be liberal in what you accept, and conservative in what you generate", and thus accept valid UTF-8 until the 31 bit. I also think there is code around to handle full UTF-8, so that is not an extra burden to do it. Best regards Keld -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
