On Wed, Apr 23, 2008 at 5:34 AM, R (Chandra) Chandrasekhar <[EMAIL PROTECTED]> wrote: > Dear Folks, > > A scheme called ITRANS uses the ASCII printing character set and between > one and three printing characters to unambiguously represent characters in > Indic scripts or a Romanized script called IAST. Since characters in these > scripts have Unicode code points, it should be possible to automate the > translation between words in the ASCII source text and the desired Unicoded > output text. > > I am trying to write a Perl script to do this and would appreciate advice > on how best to proceed before I start. > > To give a better picture of what I am trying to do, I have given some > examples below for ASCII to IAST characters: > > -------- > 1. Transliteration of between one and three ASCII printing characters to > one Unicode character. > > 2. Many characters are unchanged by the transliteration. > > 3. Some transliteration examples are shown below: > > a a U+0061 LATIN SMALL LETTER A > aa ā U+0101 LATIN SMALL LETTER A WITH MACRON > A ā U+0101 LATIN SMALL LETTER A WITH MACRON > .a ' U+0027 APOSTROPHE > ~N ṅ U+1E45 LATIN SMALL LETTER N WITH DOT ABOVE > RRI ṝ U+1E5D LATIN SMALL LETTER R WITH DOT BELOW AND MACRON > R^I ṝ U+1E5D LATIN SMALL LETTER R WITH DOT BELOW AND MACRON > -------- > > Many thanks. > > Chandra > > -- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > http://learn.perl.org/ > > >
The easiest way I can think of is to build a (UTF-8) file named itrans2unicode.table that looks like this a => a aa => ā ~N => ṅ Then read that file into a hash at startup and then process the file line by line using a regex like $line =~ s/(.)/$table{$1}/g; There is supposedly a full table at http://www.aczoom.com/itrans/#itransencoding but I was unable to load that page. -- Chas. Owens wonkden.net The most important skill a programmer can have is the ability to read.