Re: Please do not use en_US.UTF-8 outside the US

Thomas Wolff Thu, 17 Oct 2002 08:08:00 -0700

Markus Kuhn wrote:

> [EMAIL PROTECTED] wrote on 2002-10-16 14:48 UTC:
> > I came across this older mail by Markus:
> > 
> > > General warning: Please do not use the locale name en_US.UTF-8 anywhere
> > > outside North America. Some older Solaris documentation suggested that
> > > this is the only UTF-8 locale you'll ever need, as locales don't change
> > > much sensible beyond the encoding anyway. This is not the case any more
> > > today!
> > 
> > The problem is that on many Sun installations, en_US.UTF-8 is the 
> > only UTF-8 locale available at all!
> 
> I can't reproduce this problem report on our current Suns:
> 
> $ uname -a ; locale -a | grep UTF-8
> SunOS piper 5.8 Generic_108528-12 sun4u sparc SUNW,Ultra-4
> en_US.UTF-8
> fr.UTF-8
> fr_FR.UTF-8
> fr_FR.UTF-8@euro
> de.UTF-8
> es.UTF-8
> it.UTF-8
> ja_JP.UTF-8
> ko.UTF-8
> sv.UTF-8
> zh.UTF-8
> zh_TW.UTF-8
OK, I have:


wolff@fscce14:~> uname -a ; locale -a | grep UTF-8
SunOS fscce14 5.8 Generic_108528-12 sun4us sparc FJSV,GPUSK
en_US.UTF-8
sv.UTF-8
sv_SE.UTF-8
sv_SE.UTF-8@euro


> It is slightly unpleasant that there is no Commonwealth en.UTF-8 or
> British en_GB.UTF-8, but as long as you use en_US only in LC_CTYPE and
> not in LANG, your are usually fairly safe from the terror of US cultural
> conventions.
> 
> > A decent solution to this problem would be to handle basic locale 
> > information ("en_US") and encoding suffix ("UTF-8") separately and 
> > specifiy that ANY available locale can be suffixed with ANY known 
> > encoding, so installed de, gb, whatever locales could always be 
> > run with UTF-8.
> > Is anything specified anywhere about this?
> 
> http://www.opengroup.org/onlinepubs/007904975/functions/setlocale.html

I think that the formulation
        "If the string does not correspond to a valid locale, 
        setlocale() shall return a NULL pointer and the international 
        environment is not changed."
is as stupid as it could be since it imposes an "all or nothing" 
locale matching strategy.
I don't see why aspects that are handled independently should be 
tied together this way.
Even more, one would expect decent fallback behaviour, e.g. 
mapping "en_GB" to "en" where "en_GB" is not available etc.
How can this be changed?


> In principle, you could set
> 
>   LANG=de LC_CTYPE=en_US.UTF-8
OK, I get:

wolff@fscce14:~> LANG=de LC_CTYPE=en_US.UTF-8 /bin/sh
couldn't set locale correctly
couldn't set locale correctly

This is really a nuisance.

> However in practictice, if "de" is for ISO 8859-1, then it will contain
> only collating data for ISO 8859-1 and therefore work not as well as if
> you had taken the collating data from a full UTF-8 locale that comes
> with all the necessary data. Therefore, in practice, the locales that
> you mix with LC_* should preferably come with identical encodings.
Let's stay with the basic things for a moment:

I want an LC_* setting that tells my applications to use UTF-8 and 
doesn't affect the system inappropriately otherwise, and that works 
with SunOS and doesn't let /bin/sh choke!
And the "Open Group Base Specification" should not prescribe 
some restrictive handling that obstructs the most intuitive and 
simple configuration.

Kind regards,
Thomas
--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Re: Please do not use en_US.UTF-8 outside the US

Reply via email to