Oh, no, not more Unicode bashing from Keld!
In fact, the Unicode consortium is much more open to input
from the public than is ISO. Anyone can submit suggestions to
the Unicode consortium, and have them informally discussed
on the e-mail list (something which just does not happen for
10646 per se; and the 10646 e-mail list was shut down years
ago, since there was absolutely no traffic on it, the SC2/WG2 list
only has document announcements on it, no discussion at all).
For more formal consideration, you need to submit a position
paper, just as for ISO's WG2. But anyone can do that. You need
not be a member of any kind to do so. In principle that is true
for ISO('s SC2/WG2) too, but many opt to submit to Unicode,
often because some of the suggestions are about matters that
is not handled in 10646 at all.
The encoding forms are "synchronised" between Unicode and
10646. To the extent they are not identical, Unicode has the
better (stricter) specifications. Therefore the Unicode version
was taken as the normative reference when the RFC on UTF-8
was updated.
It is true that 10646 does not specify any equivalence between
strings. But I don't see why you consider that a good thing. Also,
there are only four normal forms, not a myriad (=10000, to be
nitpicking). They have some problems, true, but I would not
consider them kludges. Neither does W3C, which uses NFC as
its basis for W3C normal form (mostly for "early normalisation"
of XML based documents), nor does IETF which uses NFKC as a
basis for the "nameprep" normalisation for internationalised
domain names.
Many of the things that Unicode specifies have been under quite
some fluctuation, and the ISO WG does not have the bandwidth to
deal with them, nor would they be able to deal with them in a
timely manner, even if they had more resources.
I have the feeling that Keld feels sidestepped, since Keld has
mostly engaged himself in the ISO WGs. And he finds that not
only is much of the work carried out in the Unicode consortium,
where he is not engaged, but his comments and standardisation
work (usually directed to some ISO WG) are often criticized by
people engaged in the Unicode consortium (as well as others).
Keld is of course welcome to engage himself in the work on
Unicode, and to the extent that his comments and suggestions
are taken as reasonable by others engaged, I am sure that they
will be welcomed. If they aren't taken as reasonable, of course
they will be criticized.
As for keyboards, to get back to an issue earlier in this thread,
I would agree with the approach of having two kinds of keyboards,
"live" keyboards that can generate (some selected) combining
characters directly, to be typed after the base character, and
"dead" keyboards that uses "dead" keys and generate mostly,
though not exclusively, precomposed characters. The latter
require some extra tables (for XKB and MacOS X, which I have
prepared new keyboard layouts for; to be tested). (It's my
understanding that the "console" doesn't use XKB, but that is
a separate issue.) I don't think it would be such a good idea to
make the keyboard system perform normalisation (to NFC say).
Not just because current keyboard systems don't do so, but
that normaliser would just "see" what was recently typed, not
where that text goes. So the result need not be in any normal
form, even if each piece of text from the keyboard is in NFC.
Whether to use the "live" keyboard or the "dead" keyboard layout
(for a particular language), when both are reasonable and available,
should be up the individual user. Note that the "live" ones
are more general, you can apply a combining character sequence
to any base character, not just those combinations that are in
a prepared list for "dead" key support.
As for "cat", "cat" outputs bytes, and has no idea about characters
or character encodings. If you "cat" a file to a terminal, the encoding
of the text in the file (assuming it contains text), better be in the
encoding expected by the terminal. Else pipe it via iconv (assuming
you know the names of the two encodings involved).
/kent k
> -----Original Message-----
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of Keld
> J�rn Simonsen
> Sent: Monday, August 11, 2003 3:04 PM
> To: [EMAIL PROTECTED]
> Subject: Re: Linux console internationalization
>
>
> On Mon, Aug 11, 2003 at 10:31:16AM +0100, Markus Kuhn wrote:
>
> > ISO 10646 lacks much of the useful information, guidelines,
> databases,
> > technical reports, and subsetting information that the
> Unicode Standard
> > provides. ISO 10646 mentions briefly three implementation
> levels, which
> > look not too useful in practice and appear a bit like they
> have been put
> > in on short notice to shut up someone in the committee who
> wasn't happy
> > with combining characters.
>
> yes, Unicode have more information than ISO. Unicode has chosen not to
> forward these specs for ISO standardisation to gain control over the
> specification, AFAICT. It is like the old "embrace and
> enhance" policy
> that many big companies have so big success in doing.
>
> One of the reasons for the non-submissions of Unicode specs to ISO may
> be that many of them are kludges, like having more than one
> representation for a given chearacter (like the fully composed and the
> combining characters, and then the myriads of normalization forms then
> required to make sense of it) and the 16 bit hack of UTF-16.
>
> best regards
> keld
> --
> Linux-UTF8: i18n of Linux on all levels
> Archive: http://mail.nl.linux.org/linux-utf8/
>
--
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/linux-utf8/