On 10 November 2010 22:45, Ken Thomases <k...@codeweavers.com> wrote: > On Nov 10, 2010, at 2:27 PM, Hin-Tak Leung wrote: > >> --- On Wed, 10/11/10, Ken Thomases <k...@codeweavers.com> wrote: >> >>> Are you sure about that? Checking on a couple of >>> Linux systems here, the "locale" command reports: >>> >>> $ locale >>> LANG=en_US.UTF-8 >>> LC_CTYPE="en_US.UTF-8" >>> ... >> >> mine (fedora x86_64) does the utf8 thing: >> >> # locale >> LANG=en_GB.utf8 >> LC_CTYPE="en_GB.utf8" >> ... >> >> so there is some truth in the reporter's assertion - what it means is that >> it varies between different linux'es!!! > > I should have been clearer. The output just reflects your environment. So, > you have LANG set to en_GB.utf8. I had LANG set to en_US.UTF-8. My only > point was to say that the "UTF-8" form is acceptable. It was not to suggest > that "utf8" is not, nor that one or the other is a standard. > > The real question is: does the Linux C library accept 'UTF-8' in the > environment variables? I believe it does, which is useful because that's > what Mac OS X requires. (It doesn't accept "utf8".) > > For example, the following reports just fine on some Linux systems here: > > LC_ALL=en_GB.UTF-8 locale > > As does your case: > > LC_ALL=en_GB.utf8 locale > > But the following both produce some diagnostics indicating that the C library > is choking on the value: > > LC_ALL=en_GB.bogus locale > LC_ALL=en_GB.UTF-9 locale > > I take this to mean it's a legitimate test of whether a value is valid. > Further, it indicates that (at least some) Linuxes take either form.
I'm getting the same behaviour (Ubuntu 10.10) -- LC_ALL accepts either utf8 or UTF-8 for en_GB, en_IE, etc. The caveat here is that the primary locale needs to exist (and presumably needs to have a UTF-8 valiant present). That is, as I don't have a French locale (fr_FR) installed on my machine, the following reports errors: LC_ALL=fr_FR.UTF-8 locale This means that systems that don't have the English locale installed (en_US or en_GB, whichever is chosen) will still fail. What is wrong with iterating over the content of `locale -a` or `locale -a | grep -F utf8` to find a UTF-8 based locale? Or even: LC_ALL=`locale -a | grep -F utf8 | head -n 1` sed ... authors.c - Reece