On Sat, Nov 03, 2001 at 04:04:57PM +0200, Nerijus Baliunas wrote:
> On 31 Oct 2001 09:49:21 -0800 "H. Peter Anvin" <[EMAIL PROTECTED]> wrote:
> 
> HPA> > Global files such as /etc/*, /usr/include/*, etc. obviously *must* remain
> HPA> > in a locale invariant encoding. This is today ISO 646 IRV (US-ASCII).
> HPA> > Hopefully it will one day become UTF-8. ISO 8859-1 has no place in
> HPA> > /etc/passwd and similar files and should be strongly discouraged there.
> HPA> 
> HPA> Excuse me, but that's ridiculous.  /etc/passwd contains the names of
> HPA> people, and well, people usually don't care when they are named that
> HPA> they're going to be put into /etc/passwd.  The sysadmin has very
> HPA> little control over this -- after all, the user can run chfn(1) and
> HPA> set that up directly.  /etc/passwd should be typically be encoded in
> HPA> the system default locale.
> HPA> 
> HPA> In practice, as all of this painfully illustrates, is that multiple
> HPA> encodings in anything but an isolated environment is ultimately
> HPA> futile.  Whereas data in a lot of contexts can be labelled, stuff that
> HPA> is "around the system in general" -- may it be usernames, filenames,
> HPA> /etc/passwd, etc, are ultimately have to be encoded in the encoding
> HPA> specified by the system default locale, and the goal is for that to
> HPA> become UTF-8.
> 
> Excuse me, what a mess that would create! How would you know which encoding
> /etc/passwd is in? What if you have both Japanese and Russian users on
> your system? UTF-8 is the only candidate. You can use iconv to convert
> user's input to UTF-8.

How, exactly, are you three disagreeing?  The first quote seems to say
"/etc/* must be in the same encoding, today that's ISO 646 IRV, hopefully
that will become UTF-8"; the second, "files around the systems must be in
a specific, non-user-set locale, and hopefully eventually that will
become UTF-8", and the third, "/etc/passwd can't be user-specific".

Those are all in agreement, unless I'm misinterpreting the phrase
"locale-invariant" (which I'm assuming to mean "always in the same
locale"; I suppose it could mean "an encoding that works in any locale",
but that would be inconsistent with "hopefully it will one day become
UTF-8".)  Nothing's ridiculous, and nothing's a mess. :)

Of course, chfn should convert to whatever the system locale is, and do
something reasonable if the data entered doesn't fit in that locale; I
doubt it does this right now.  I don't even know if there is such a
concept as a "system locale"; is there any way to set such a thing?  As
far as I know, it's just assumed to be ASCII.

Tangent:

In practice, I'd see using any locale but the system locale--except where the
system locale is a subset of the user locale, of course, which is usually
the case with ASCII but will almost never be the case with UTF-8--to be
unreasonable.  If my terminal (and hence my locale) is SJIS and the system
locale is UTF-8, I can't even cat any files other than my own without using
iconv.  This isn't a problem for those of us who upgrade to UTF-8, but it
leaves those who want to use other encodings with a pretty major problem.
(And if you can't reasonably use alternate encodings, what's the use of
being able to set it at all in the locale?  It gives an upgrade path to
UTF-8, but it doesn't give the choice of alternates like it does now, nor
any upgrade path to anything incompatible with UTF-8.)

To be clear, I agree with using UTF-8 as a system locale, and I'd like
to see everyone using it--but in practice, a few people aren't going to
want to, and if just a few people don't want to use UTF-8, even for
irrational reasons, it'll prevent many admins from switching a system
to it.

-- 
Glenn Maynard
-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Reply via email to