On 06/08/2013 15:26, Bruce Hill wrote:
On Tue, Aug 06, 2013 at 02:40:04PM +0100, Kerin Millar wrote:
Apparently, "utf8" is the canonical representation in glibc (which
provides the locale tool):
http://lists.debian.org/debian-glibc/2004/12/msg00028.html
That eselect enumerates the locale twice when the alternate form is
specified in /etc/env.d/02locale could be considered as a minor bug.
--Kerin
RFC 3629 does not mention utf8, but I did see this notation in Wikipedia, and
yes, I understand that's not official:
Other descriptions that omit the hyphen or replace it with a space, such as
"utf8" or "UTF 8", are not accepted as correct by the governing standards.[14]
Despite this, most agents such as browsers can understand them, and so
standards intended to describe existing practice (such as HTML5) may
effectively require their recognition.
[14] http://www.ietf.org/rfc/rfc3629.txt
Internally, glibc may use whatever representation it pleases.
I was only mildly curious seeing utf8 show up, because on numberous occasions
in #gentoo on FreeNode there have been different reports of incorrect
characters displayed with utf8, then fixed with UTF-8. Having read RFC 3629, I
just made it a habit to always use the standard (UTF-8).
Probably due to buggy applications. According to a glibc maintainer,
they should be using the nl_langinfo() function but some try to read the
locale name itself. The response of both of these commands is the same:
# LC_ALL=en_US.UTF-8 locale -k LC_CTYPE | grep charmap
# LC_ALL=en_US.utf8 locale -k LC_CTYPE | grep charmap
Ergo, applications that use the correct interface will be informed that
the character encoding is "UTF-8", irrespective of the format of the
locale name.
Given the above, sticking to the "<lang>_<territory>.UTF-8" format seems
wise.
Having read the remainder of the Debian ML thread you referenced, I have a
headache. Debian did that to me when I used it for ~3 months in 2003. :-)
Cheers,
Bruce