Re: Transliteration for Cyrillic (UTF-8 support for freedb and Ogg Vorbis)

Thomas Uwe Gruettmueller Thu, 04 Oct 2001 17:37:17 -0700

On Thursday, 4. October 2001 17:48, Alexander Voropay wrote:
> From: Edmund GRIMLEY EVANS <[EMAIL PROTECTED]> wrote:
> >The first problem is to provide a good transliteration
> >table: glibc and libiconv don't transliterate Cyrillics, I
> > think, so can anyone recommend such a table?
>
>  You could translate Unicode to KOI8-R table and clear 8-th
> bit. The KOI8-* charsets was invented to work at non 8-bit
> cleant equipment in mid-80. Practically all Russians can read
> such transliteration.


Edmund was talking about a "good transliteration" ;o)

I don't know anything about translitering legacy characters, 
such as Yat',  so I can only talk about the Russian alphabet.

АБВГДЕЁЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ
абвгдеёжзийклмнопрстуфхцчшщъыьэюя

I can't find the relevant ISO standard, but IIRC, the 
transliterated alphabet should look like:

ABVGDEËŽZIJKLMNOPRSTUFHCČŠŜ"Y'ĖÛÂ
abvgdeëžzijklmnoprstufhcčšŝ"y'ėûâ

It is not possible to encode this in ISO-8559-1, because Ž, Č, 
Š, Ŝ and Ė, as well as their lowercase representations, are not 
available. ISO-8559-2 looks much better, it only lacks Ŝ and Ė. 
Maybe they can be replaced by Ş(ş) and Ę(ę), or with some other 
similar characters. Maybe Č(č) should be represented as Ç(ç), 
which is also present in ISO-8559-1. 

When the ISO-8559-2 encoded transliteration is interpreted as 
ISO-8559-1, some characters will be misrepresented and look like 
this:

ABVGDEË®ZIJKLMNOPRSTUFHCÇ©ª"Y'ÊÛÂ
abvgdeë¾zijklmnoprstufhcç¹º"y'êûâ

However, this still looks better than decaputated KOI8-R:

abwgde3vzijklmnoprstufhc~{}yx|`q
ABWGDE#VZIJKLMNOPRSTUFHC^[]_YX\@Q

BTW: Why the hell are upper and lower case interchanged in 
KOI8-R???

OK, HTH
Thomas
 }:o{#

-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Re: Transliteration for Cyrillic (UTF-8 support for freedb and Ogg Vorbis)

Reply via email to