On Sat, Nov 03, 2001 at 04:04:57PM +0200, Nerijus Baliunas wrote: > On 31 Oct 2001 09:49:21 -0800 "H. Peter Anvin" <[EMAIL PROTECTED]> wrote: > > HPA> > Global files such as /etc/*, /usr/include/*, etc. obviously *must* remain > HPA> > in a locale invariant encoding. This is today ISO 646 IRV (US-ASCII). > HPA> > Hopefully it will one day become UTF-8. ISO 8859-1 has no place in > HPA> > /etc/passwd and similar files and should be strongly discouraged there. > HPA> > HPA> Excuse me, but that's ridiculous. /etc/passwd contains the names of > HPA> people, and well, people usually don't care when they are named that > HPA> they're going to be put into /etc/passwd. The sysadmin has very > HPA> little control over this -- after all, the user can run chfn(1) and > HPA> set that up directly. /etc/passwd should be typically be encoded in > HPA> the system default locale. > HPA> > HPA> In practice, as all of this painfully illustrates, is that multiple > HPA> encodings in anything but an isolated environment is ultimately > HPA> futile. Whereas data in a lot of contexts can be labelled, stuff that > HPA> is "around the system in general" -- may it be usernames, filenames, > HPA> /etc/passwd, etc, are ultimately have to be encoded in the encoding > HPA> specified by the system default locale, and the goal is for that to > HPA> become UTF-8. > > Excuse me, what a mess that would create! How would you know which encoding > /etc/passwd is in? What if you have both Japanese and Russian users on > your system? UTF-8 is the only candidate. You can use iconv to convert > user's input to UTF-8.
How, exactly, are you three disagreeing? The first quote seems to say "/etc/* must be in the same encoding, today that's ISO 646 IRV, hopefully that will become UTF-8"; the second, "files around the systems must be in a specific, non-user-set locale, and hopefully eventually that will become UTF-8", and the third, "/etc/passwd can't be user-specific". Those are all in agreement, unless I'm misinterpreting the phrase "locale-invariant" (which I'm assuming to mean "always in the same locale"; I suppose it could mean "an encoding that works in any locale", but that would be inconsistent with "hopefully it will one day become UTF-8".) Nothing's ridiculous, and nothing's a mess. :) Of course, chfn should convert to whatever the system locale is, and do something reasonable if the data entered doesn't fit in that locale; I doubt it does this right now. I don't even know if there is such a concept as a "system locale"; is there any way to set such a thing? As far as I know, it's just assumed to be ASCII. Tangent: In practice, I'd see using any locale but the system locale--except where the system locale is a subset of the user locale, of course, which is usually the case with ASCII but will almost never be the case with UTF-8--to be unreasonable. If my terminal (and hence my locale) is SJIS and the system locale is UTF-8, I can't even cat any files other than my own without using iconv. This isn't a problem for those of us who upgrade to UTF-8, but it leaves those who want to use other encodings with a pretty major problem. (And if you can't reasonably use alternate encodings, what's the use of being able to set it at all in the locale? It gives an upgrade path to UTF-8, but it doesn't give the choice of alternates like it does now, nor any upgrade path to anything incompatible with UTF-8.) To be clear, I agree with using UTF-8 as a system locale, and I'd like to see everyone using it--but in practice, a few people aren't going to want to, and if just a few people don't want to use UTF-8, even for irrational reasons, it'll prevent many admins from switching a system to it. -- Glenn Maynard - Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
