Markus Kuhn wrote:
> [EMAIL PROTECTED] wrote on 2002-10-16 14:48 UTC:
> > I came across this older mail by Markus:
> >
> > > General warning: Please do not use the locale name en_US.UTF-8 anywhere
> > > outside North America. Some older Solaris documentation suggested that
> > > this is the only UTF-8 locale you'll ever need, as locales don't change
> > > much sensible beyond the encoding anyway. This is not the case any more
> > > today!
> >
> > The problem is that on many Sun installations, en_US.UTF-8 is the
> > only UTF-8 locale available at all!
>
> I can't reproduce this problem report on our current Suns:
>
> $ uname -a ; locale -a | grep UTF-8
> SunOS piper 5.8 Generic_108528-12 sun4u sparc SUNW,Ultra-4
> en_US.UTF-8
> fr.UTF-8
> fr_FR.UTF-8
> fr_FR.UTF-8@euro
> de.UTF-8
> es.UTF-8
> it.UTF-8
> ja_JP.UTF-8
> ko.UTF-8
> sv.UTF-8
> zh.UTF-8
> zh_TW.UTF-8
OK, I have:
wolff@fscce14:~> uname -a ; locale -a | grep UTF-8
SunOS fscce14 5.8 Generic_108528-12 sun4us sparc FJSV,GPUSK
en_US.UTF-8
sv.UTF-8
sv_SE.UTF-8
sv_SE.UTF-8@euro
> It is slightly unpleasant that there is no Commonwealth en.UTF-8 or
> British en_GB.UTF-8, but as long as you use en_US only in LC_CTYPE and
> not in LANG, your are usually fairly safe from the terror of US cultural
> conventions.
>
> > A decent solution to this problem would be to handle basic locale
> > information ("en_US") and encoding suffix ("UTF-8") separately and
> > specifiy that ANY available locale can be suffixed with ANY known
> > encoding, so installed de, gb, whatever locales could always be
> > run with UTF-8.
> > Is anything specified anywhere about this?
>
> http://www.opengroup.org/onlinepubs/007904975/functions/setlocale.html
I think that the formulation
"If the string does not correspond to a valid locale,
setlocale() shall return a NULL pointer and the international
environment is not changed."
is as stupid as it could be since it imposes an "all or nothing"
locale matching strategy.
I don't see why aspects that are handled independently should be
tied together this way.
Even more, one would expect decent fallback behaviour, e.g.
mapping "en_GB" to "en" where "en_GB" is not available etc.
How can this be changed?
> In principle, you could set
>
> LANG=de LC_CTYPE=en_US.UTF-8
OK, I get:
wolff@fscce14:~> LANG=de LC_CTYPE=en_US.UTF-8 /bin/sh
couldn't set locale correctly
couldn't set locale correctly
This is really a nuisance.
> However in practictice, if "de" is for ISO 8859-1, then it will contain
> only collating data for ISO 8859-1 and therefore work not as well as if
> you had taken the collating data from a full UTF-8 locale that comes
> with all the necessary data. Therefore, in practice, the locales that
> you mix with LC_* should preferably come with identical encodings.
Let's stay with the basic things for a moment:
I want an LC_* setting that tells my applications to use UTF-8 and
doesn't affect the system inappropriately otherwise, and that works
with SunOS and doesn't let /bin/sh choke!
And the "Open Group Base Specification" should not prescribe
some restrictive handling that obstructs the most intuitive and
simple configuration.
Kind regards,
Thomas
--
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/linux-utf8/