On Sat, 1 Sep 2001, Zvi Har'El wrote:
> Forgive my ignorance, but redhat 7.1 has for X11 the locale directory
> /usr/X11R6/lib/X11/locale/en_US.UTF-8 and for glibc the locale directory
> /usr/lib/locale/en_US.utf8 . Is there a difference between the suffix UTF-8 and
> utf8? will xterm recognize the latter?

The locale should always be called "*.UTF-8", that is in capital and with
hyphen, in the locale environment variables. However, glibc normalizes
encoding names internally. It removes all hyphens, underscores, etc. and
converts everything to lowercase. In particular, the string that setlocale
returns is this normalized form (which shouldn't be looked at by portable
applications!). That's why you see "utf8" occasionally popping up,
especially in the locale pathnames. It's all the fault of the OS vendors
who couldn't agree on whether to write "ISO_8859-1" or "ISO-8859-1" or
"iso8859-1". For UTF-8, we have at least a standard single way of writing
it ("UTF-8"), but the normalization mechanism is in place now though it
was only needed for ISO 8859.

It might be nicer if glibc used instead of a normalization routine an
internal table of known encoding names, and just matched with the
normalization routine against the stored names, and if it recognizes
a name such as "UTF-8", it should output that as the cannonical form,
not the normalized "utf8". That would avoid lots of headaches.

I'd argue that the cannonical names should be the preferred MIME names:

  UTF-8, ISO-8859-1, ISO-8859-2, ISO-8859-3, ISO-8859-5, ISO-8859-6,
  ISO-8859-7, ISO-8859-8, ISO-8859-9, ISO-8859-13, ISO-8859-15, EUC-JP,
  EUC-KR, GB2312, KOI8-R, KOI8-U, VISCII, WINDOWS-1251,
  WINDOWS-1256

Li18nux was planning to standardize a locale name syntax, and then glibc
can hopefully normalize into that standard syntax.

The recommended procedure for recognising that a locale uses UTF-8
is described in

  http://www.cl.cam.ac.uk/~mgk25/unicode.html#activate

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: <http://www.cl.cam.ac.uk/~mgk25/>

-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Reply via email to