On 06/08/2013 23:42, Stroller wrote:

On 6 August 2013, at 14:04, Kerin Millar wrote:
...
If undefined, the value of LC_COLLATE is inherited from LANG. I'm not sure that 
overriding it is particularly useful nowadays but it doesn't hurt.

It's been a couple of years since I looked into this, but I'm given to believe 
that LANG should set all LC_ variables correctly, and that overriding them is 
frowned upon.

As has been mentioned, there are valid reasons to want to override the collation. Here is a concrete example:

https://lists.gnu.org/archive/html/bug-gnu-utils/2003-08/msg00537.html

Strictly speaking, grep is correct to behave that way but it can be confounding. In an ideal world, everyone would be using named classes instead of ranges in their regular expressions but it's not an ideal world.

These days, grep no longer exhibits this characteristic in Gentoo. Nevertheless, it serves as a valid example of how collations for UTF-8 locales can be a liability.

Of the other distros, Arch Linux also defined LC_COLLATE=C although I understand that they have just recently stopped doing that.

On a production system, I would still be inclined to use it for reasons of safety. For that matter, some people refuse to use UTF-8 at all on the grounds of security; the handling of variable-width encodings continues to be an effective bug inducer.

I had to do this myself because, due to a bug, the en_GB time formatting failed 
to display am or pm. I believe this should be fixed now.

Presumably:

a) LANG was defined inappropriately
b) LANG was defined appropriately but LC_TIME was defined otherwise
c) LC_ALL was defined, trumping all

I would definitely not advise doing any of these things.

--Kerin

Reply via email to