Markus Kuhn wrote: > [EMAIL PROTECTED] wrote on 2002-10-16 14:48 UTC: > > I came across this older mail by Markus: > > > > > General warning: Please do not use the locale name en_US.UTF-8 anywhere > > > outside North America. Some older Solaris documentation suggested that > > > this is the only UTF-8 locale you'll ever need, as locales don't change > > > much sensible beyond the encoding anyway. This is not the case any more > > > today! > > > > The problem is that on many Sun installations, en_US.UTF-8 is the > > only UTF-8 locale available at all! > > I can't reproduce this problem report on our current Suns: > > $ uname -a ; locale -a | grep UTF-8 > SunOS piper 5.8 Generic_108528-12 sun4u sparc SUNW,Ultra-4 > en_US.UTF-8 > fr.UTF-8 > fr_FR.UTF-8 > fr_FR.UTF-8@euro > de.UTF-8 > es.UTF-8 > it.UTF-8 > ja_JP.UTF-8 > ko.UTF-8 > sv.UTF-8 > zh.UTF-8 > zh_TW.UTF-8 OK, I have:
wolff@fscce14:~> uname -a ; locale -a | grep UTF-8 SunOS fscce14 5.8 Generic_108528-12 sun4us sparc FJSV,GPUSK en_US.UTF-8 sv.UTF-8 sv_SE.UTF-8 sv_SE.UTF-8@euro > It is slightly unpleasant that there is no Commonwealth en.UTF-8 or > British en_GB.UTF-8, but as long as you use en_US only in LC_CTYPE and > not in LANG, your are usually fairly safe from the terror of US cultural > conventions. > > > A decent solution to this problem would be to handle basic locale > > information ("en_US") and encoding suffix ("UTF-8") separately and > > specifiy that ANY available locale can be suffixed with ANY known > > encoding, so installed de, gb, whatever locales could always be > > run with UTF-8. > > Is anything specified anywhere about this? > > http://www.opengroup.org/onlinepubs/007904975/functions/setlocale.html I think that the formulation "If the string does not correspond to a valid locale, setlocale() shall return a NULL pointer and the international environment is not changed." is as stupid as it could be since it imposes an "all or nothing" locale matching strategy. I don't see why aspects that are handled independently should be tied together this way. Even more, one would expect decent fallback behaviour, e.g. mapping "en_GB" to "en" where "en_GB" is not available etc. How can this be changed? > In principle, you could set > > LANG=de LC_CTYPE=en_US.UTF-8 OK, I get: wolff@fscce14:~> LANG=de LC_CTYPE=en_US.UTF-8 /bin/sh couldn't set locale correctly couldn't set locale correctly This is really a nuisance. > However in practictice, if "de" is for ISO 8859-1, then it will contain > only collating data for ISO 8859-1 and therefore work not as well as if > you had taken the collating data from a full UTF-8 locale that comes > with all the necessary data. Therefore, in practice, the locales that > you mix with LC_* should preferably come with identical encodings. Let's stay with the basic things for a moment: I want an LC_* setting that tells my applications to use UTF-8 and doesn't affect the system inappropriately otherwise, and that works with SunOS and doesn't let /bin/sh choke! And the "Open Group Base Specification" should not prescribe some restrictive handling that obstructs the most intuitive and simple configuration. Kind regards, Thomas -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/