On 06/08/2013 15:26, Bruce Hill wrote:
On Tue, Aug 06, 2013 at 02:40:04PM +0100, Kerin Millar wrote:

Apparently, "utf8" is the canonical representation in glibc (which
provides the locale tool):

http://lists.debian.org/debian-glibc/2004/12/msg00028.html

That eselect enumerates the locale twice when the alternate form is
specified in /etc/env.d/02locale could be considered as a minor bug.

--Kerin

RFC 3629 does not mention utf8, but I did see this notation in Wikipedia, and
yes, I understand that's not official:

Other descriptions that omit the hyphen or replace it with a space, such as
"utf8" or "UTF 8", are not accepted as correct by the governing standards.[14]
Despite this, most agents such as browsers can understand them, and so
standards intended to describe existing practice (such as HTML5) may
effectively require their recognition.

[14] http://www.ietf.org/rfc/rfc3629.txt

Internally, glibc may use whatever representation it pleases.

I was only mildly curious seeing utf8 show up, because on numberous occasions
in #gentoo on FreeNode there have been different reports of incorrect
characters displayed with utf8, then fixed with UTF-8. Having read RFC 3629, I
just made it a habit to always use the standard (UTF-8).

Probably due to buggy applications. According to a glibc maintainer, they should be using the nl_langinfo() function but some try to read the locale name itself. The response of both of these commands is the same:

# LC_ALL=en_US.UTF-8 locale -k LC_CTYPE | grep charmap
# LC_ALL=en_US.utf8  locale -k LC_CTYPE | grep charmap

Ergo, applications that use the correct interface will be informed that the character encoding is "UTF-8", irrespective of the format of the locale name.

Given the above, sticking to the "<lang>_<territory>.UTF-8" format seems wise.


Having read the remainder of the Debian ML thread you referenced, I have a
headache. Debian did that to me when I used it for ~3 months in 2003.  :-)

Cheers,
Bruce


Reply via email to