On Sun, Jan 11, 2004 at 05:45:19PM +0330, Roozbeh Pournader wrote:
> On Sat, 2004-01-10 at 23:51, Edward H. Trager wrote:
> > I guess I was recalling (from http://www.cl.cam.ac.uk/~mgk25/unicode.html) 
> > that six bytes allows encoding all possible 
> > 2^31 UCS code points, although
> > I suppose nothing above plane 1 has been defined.
> 
> 1. That page is a little out of date (although a wonderful resource).
> 
> 2. Although UCS theoretically allows 2^31 code points, it will never
> encode any character higher than U+10FFFF.

Well, you can never tell. I know that Sc2/WG2 has said that they will
never allocate something above the 21th bit, but then again they said
they would never reallocate characters, and then they did it anyway.

I would say: "be liberal in what you accept, and conservative in what
you generate", and thus accept valid UTF-8 until the 31 bit.
I also think there is code around to handle full UTF-8, so that is not
an extra burden to do it.

Best regards
Keld

--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Reply via email to