RE: Revision of UTF-8 history in draft-yergeau-rfc2279bis-05.txt

Henry Spencer Sat, 14 Jun 2003 13:53:37 -0700

On Thu, 12 Jun 2003, Murray Sargent wrote:
>A key point so far missing in this thread and as far as I can tell in 
>Markus's paper is that UTF-8 is a subset of FSS-UTF and not the full 
>FSS-UTF.


No, it is an evolved (and renamed) version of FSS-UTF, rather than a
distinct encoding. 

>In particular, UTF-8 has the restrictions:
>1. Shortest UTF-8 form for a 32-bit value is always used; longer forms 
>are illegal

Already present in the original FSS-UTF proposal from Ken Thompson.  (The
omission of this rule from some later documents is regrettable but is
essentially an error in those documents.)

>2. The surrogate codes 0xD800 - 0xDFFF are illegal in UTF-8 form
>3. Only the first 17 planes are legal...

These are later, evolutionary changes in FSS-UTF/UTF-8 to match changes in
Unicode, not fundamental differences between two different encodings.

                                                          Henry Spencer
                                                       [EMAIL PROTECTED]


--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

RE: Revision of UTF-8 history in draft-yergeau-rfc2279bis-05.txt

Reply via email to