Re: Please do not use en_US.UTF-8 outside the US

Ienup Sung Fri, 18 Oct 2002 14:59:57 -0700

I also think the current POSIX global/single locale model is limiting and
we do need to have MT-safe, multi-locale APIs. I know this may not be agreed
upon by everyone but I believe we need to have some form of locales for each
and every region/country at least one since it is not really possible to have
a single, unified and universal cultural convention and language/writing
system data in a cost effective and manageable manner at the moment even in
the LC_CTYPE category data.


And in that reasoning, having as many Unicode/UTF-8 locales are not so bad
idea at all in my opinion (well, at least at the locale instance level not
the locale definition source level). (In doing so, one could also use
a template for a rather rapid population of Unicode locales and also by
sharing as many common locale definitions as posssible.)

Yes, you're correct and what I meant by the "One can always add any locales
after the Solaris installation" at my previous email was that people can add
their locales by being root or asking someone who can be a root (i.e.,
sys admin in many cases). Obviously, for security reason alone, we wouldn't
be able to make/allow the locale addition/removal by any and everyone even
though it's possible to do by providing an utility that will do setuid to root
before the locale installation or something similar like that in my opinion.

With regards,

Ienup

PS. One possible example why it's difficult to have a single, unified
LC_CTYPE is like the following:

In Turkish, in my understanding, the (simple) case conversion goes like
the following:

        From case --------------------> To case

        I (U+0049)                      dotless i (U+0131)
        i (U+0069)                      I with dot above (U+0130)
        I with dot above (U+0130)       i (U+0069)
        dotless i (U+0131)              I (U+0049)

But in others, it usually goes like the following:

        From case --------------------> To case

        I (U+0049)                      i (U+0069)
        i (U+0069)                      I (U+0049)
        I with dot above (U+0130)       i (U+0069)
        dotless i (U+0131)              I (U+0049)

for obvious reasons (and also due to some limitations we have at POSIX).


] Date: Fri, 18 Oct 2002 11:26:08 +0200 (MEST)
] From: Thomas Wolff <[EMAIL PROTECTED]>
] Subject: Re: Please do not use en_US.UTF-8 outside the US
] To: [EMAIL PROTECTED], Ienup Sung <[EMAIL PROTECTED]>
] 
] Thanks that a Sun engineer responds to the problem here.
] 
] keld wrote:
] > ISO 15897 also has some fallback rules. I think that could be 
] > extended in some way, so that you may specify more locales to
] > chose from, like it is done with accept-language: in http.
] > I think some software already does this. Current glibc supports
] > ISO 15897, but that support is going to be removed, as far as I know.
] ?? This is again just stupid.
] 
] 
] Ienup Sung <[EMAIL PROTECTED]> wrote:
] 
] > I just would like to point out that we started with en_US.UTF-8 and ko.UTF-8
] > at Solaris 2.6 back then 1996 or so. Since then, we've been gradually and 
also
] > consistently increasing the number of Unicode/UTF-8 locales and that's our
] > goal, i.e., try to supply as many as Unicode/UTF-8 locales as our (limited)
] > resource allows.
] > 
] > Also, as the locale name specifies, the en_US.UTF-8 is a locale for American
] > English at the States. We have never even tried to pursuade anyone to
] > use the locale as the only solution; we are also quite surprised that people
] > have seen it that way.
] > 
] > As an additional evidence, in Solaris 9, we have:
] > 
] > ... [lots of locales]
] My point is actually that it is a wrong strategy to handle it by 
] increasing the number of UTF-8 locales. There should basically 
] be no such thing as an UTF-8-bound locale. The convention of 
] using LC_CTYPE to specify both locale and encoding is OK if these 
] are handled separately.
] A generic solution is required, as also Keld argued.
] 
] 
] Ienup Sung <[EMAIL PROTECTED]> continued:
] 
] > Regarding the different number of locales for the same Solaris release
] > systems, the reason is when you install/upgrade your system, you might have
] > chose only those locales. I.e., probably your system admin or jump start
] > installation specified or selected during the installation/upgrade or
] > during the preparation of the jump start installation script.
] > <-- This is not everyone wants to have all the locales that we have to offer
] > and so we show what kind of locales are available during the 
] > installation/upgrade that can be selected as needed.
] 
] > One can, by the way, always add locales to an existing systems after the 
] > installation and one way is specified in the following web pages:
] > 
] >     http://www.sun.com/developers/gadc/faq/sol8.html
] Sorry, this is not true.
] One cannot do it, only the system administrator can do it.
] 
] Please also consider the following response:
] 
] From: Glenn Maynard <[EMAIL PROTECTED]>
] 
] > Admins, with no personal
] > interest in UTF-8 and few users using it, are likely to only generate
] > legacy locales, and not enable UTF-8 ones.  This probably isn't any
] > particular desire *not* to have it; they just don't know the difference
] > (and shouldn't need to).
] > 
] > So, even though my system and terminal is UTF-8, and all of the systems
] > I connect to are *capable* of it, only a few actually have the locale
] > available.  This is a senseless hurdle to using UTF-8; I have to nag
] > admins to generate UTF-8 locales, even though all of the software I'm
] > using has already been updated to handle it!  Long before UTF-8 can ever
] > be the default encoding everywhere, it needs to be *available* everywhere
] > (without root intervention).
] > 
] > This is a problem on Debian, at least.  It shows a list of locale names;
] > you only get UTF-8 if you ask for it.  It should probably show a list of
] > country/language codes; eg. choosing en_US should generate both
] > en_US (ISO-8859-1) and en_US.UTF-8, unless the user specifically asks
] > for UTF-8 to not be generated.
] 
] 
] Ienup Sung <[EMAIL PROTECTED]> continued:
] 
] > Actually, we ship all our locales in a single product and so if you've
] > Solaris 8 or later, all the locales are in the Solaris Software 1 of 2 CD.
] > (Translated message files and some locale-specific files and applications 
for
] > French, Italian, German, Spanish, Swedish, Simplified Chinese, Traditional
] > Chinese, Japanese and Korean are at the Languages CD by the way.)
] > 
] > We couldn't do that before S8 simply because there were licensing issues on
] > fonts and some input methods that we couldn't resolve until the S8 timeframe
] > which took a lot of money and time from us.
] 
] And Glenn Maynard <[EMAIL PROTECTED]> continued:
] > I suppose one reason this isn't done is because locale generation does
] > take quite a while (maybe 20 seconds per locale on my system).  There
] > are probably other, less obvious reasons this isn't done, but I don't
] > know them.  One such might be http://bugs.debian.org/99623 ; but that
] > doesn't seem to prevent generating UTF-8 most of the time.
] 
] Once again, the approach of having to install/generate/whatever 
] locales to support UTF-8 is fundamentally wrong.
] Just separate locale and encoding recognition (using LC_CTYPE as a 
] common base) and the problem will vanish, that's the only 
] reasonable solution.
] 
] It needs to be fixed!
] 
] Thomas Wolff
] --
] Linux-UTF8:   i18n of Linux on all levels
] Archive:      http://mail.nl.linux.org/linux-utf8/
] 

--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Re: Please do not use en_US.UTF-8 outside the US

Reply via email to