Transliteration for use in UTF-8 locales

Markus Kuhn Tue, 10 Oct 2000 05:56:58 -0700
The ISO TR 14652 transliteration mechanism (which is already partly
implemented in glibc 2.1.95) was probably primarily intended for people
whose I/O devices cannot handle UTF-8 and want to see as much as
possible of the wide character information in their 7/8-bit coding
system. If you have a UTF-8 locale, every wide character can uniquely
and without loss of information be converted into a multi-byte
character, and no transliteration seems necessary at first sight.

While doing some research for the transliteration tables that I
currently put together, it occurred to me that there is a quite second
good reason for using transliteration. Even if people work in a UTF-8
locale with fully Unicode capable I/O devices everywhere, their brain
might still not yet be fully Unicode capable and they might still want
library-level transliteration to aid in reading the text. I would find
it very convenient to have a

  de_DE.UTF-8@romanized

locale, that uses the UTF-8 encoding, but nevertheless applies
transliteration (optimized for a German reader) to non-Latin scripts.
This way if say Russian, Greek, Hebrew or Arabic people write their
names in Email headers From: lines in their native script, I will still
be able to get a romanized display that helps me to guess the
pronunciation of their names reasonably well. This has nothing to do
with converting to ASCII. In fact, many of the ISO standardized
transliteration schemes add lot's of accents to the romanized output of
a transliteration, in order to minimize the loss of information.
The UTF-8 output of romanized Greek or Cyrillic text will typically
contain lots of Latin characters not found in ISO 8859 or ISO 6937.

Just something that people playing around with glibc transliteration
might keep in mind.

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: <http://www.cl.cam.ac.uk/~mgk25/>

-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/lists/
Transliteration for use in UTF-8 locales

Reply via email to