Ed Hart wrote:
> I'd like to make an observation.  According to Markus Kuhn, 
> Ken Thompson
> designed UTF-8.  This is not quite true.  Ken Thompson 
> (according to Markus) designed FSS-UTF.

I think you are conflating two slightly different algorithms that,
unfortunately, went under the "FSS-UTF" moniker.  One is the original X/Open
proposal, which the Plan9 guys found imperfect (not self-synchronizing).
The other is the modification that, according to Markus' references, Ken
Thomson designed on a diner placemat and sent in an email dated Fri Sep  4
03:37:39 EDT 1992.  X/Open picked up this modified algorithm and pushed it
through standardization.  In the process, the name migrated from FSS-UTF to
UTF-2 to UTF-8, but Thomson's email shows clearly that what we call UTF-8
today is what he proposed back then.

> As I recall, the ISO/IEC 10646 Working Group was aware of the X-Open,
> FSS-UTF.

Which one?  Before or after Thomson's proposal?

> UTF-8 accounted for the surrogates of UTF-16 by forcing a
> conversion of any text encoded with UTF-16 to UCS-4 (32-bit 
> form) and then
> converting text encoded in UCS-4 to UTF-8.

Thomson's algorithm certainly didn't convert from UTF-16, which didn't
exist.  It is clear from his email that characters ("UCS values") between
10000 and 10FFFF are converted to 4-byte sequences exactly as in today's
UTF-8, and not to 2x3-byte sequences as in CESU-8.

-- 
Fran�ois Yergeau
--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Reply via email to