On Fri, 6 Dec 2002, Keld =?iso-8859-1?Q?J=F8rn?= Simonsen wrote:
> Actually it is funny that you call it Unicode. UTF-8 clearly comes from
> the 10646 side of UCS, Unicode did not invent it at all...
It did not come from 10646 either; it came from the *Unix* side of the
house, specifically from X/Open. And my understanding is that it was
originally specifically an encoding for Unicode (although the distinction
quickly became academic because of the conversion of 10646 into a Unicode
clone).
Nobody except some standards zombies cared about encoding 10646, or indeed
about any aspect of 10646; Unicode was the standard that the real world
was clearly signing up for. Which is why the 10646 committee, seeing the
writing on the wall, abandoned its own efforts and aligned its standard
with Unicode.
Some of the Unicode standards guys were dead-set against any encoding
except plain 16-bit (but which byte order? :-)), but potential *users* of
Unicode were much more pragmatic. UTF-8 originally came out of the desire
for a backward-compatible encoding for use in Unix filenames.
In any case, Unicode is much the more widely-known name, and much the more
readily available standard, and (as others have noted) also comes with a
lot of relevant supplementary information that 10646 lacks.
> The way 10646 is coming to Linux is also much
> with the support from the ISO 14651 sorting standard and the ISO
> TR 14652 locale standard.
My understanding is that an ISO TR, by definition, is not a standard.
> I think the proper way to characterize what we do now in Linux is
> to say ISO 10646, and probably mention Unicode in parenthesis the first
> time it appears.
The pragmatic, and historically correct, way is the reverse. ISO 10646
delivers the ISO stamp (stomp? :-)) of approval for Unicode... but the
standard you will find on the shelves of the people who do the work is
labelled "Unicode".
Henry Spencer
[EMAIL PROTECTED]
--
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/linux-utf8/