Hi, I found the following in unicode group at Yahoo Groups referring to ICU implementation of ISCII <-> unicode conversion: http://groups.yahoo.com/group/unicode/message/11649
" ISCII is algorithmic. The mapping part to/from Unicode is fairly straightforward because Unicode's encoding of Indic scripts is based on an earlier version of ISCII. For details take a look at the source code: http://oss.software.ibm.com/cvs/icu/~checkout~/icu/source/common/ucnvisci.c " I will get back to you regarding TSCII conversion. /vathanan p.s. sorry for the late reply .. i was on vacation :( ----- Original Message ----- From: "Alexander Barkov" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]>; "Vathanana Kumarathurai" <[EMAIL PROTECTED]> Sent: Tuesday, March 26, 2002 11:44 PM Subject: Re: Support for TSCII (Tamil charset) > Hi! > > Starting from 3.2.0, internally mnogosearch works in Unicode. > To add support for a new character set, one need to > write charset->unicode and unicode->charset convertion > routines. This is very simple for European languages, > which are mostly covered by 8bit character set. > The only thing we need is charset->unicode mapping table > to add a support of simple charset of such kind. Such > tables are available from ftp.unicode.org. > > ISCII seems to be a simple (almost!!!) charset in this meanning. > I found a mapping table for MacGujarati charset here, > which is almost the same with ISCII: > http://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/GUJARATI.TXT > Where we can find ISCII mapping table? > > Please also note this section in mapping file: > > # Section 1: Map the following byte pairs as indicated: > # (ZWNJ means ZERO WIDTH NON-JOINER, ZWJ means ZERO WIDTH JOINER) > # (Also see note about 0xF0 in comments above) > > 0xA1+0xE9 > 0x0AD0 > # GUJARATI OM > 0xAA+0xE9 > 0x0AE0 > # GUJARATI LETTER VOCALIC RR > 0xDF+0xE9 > 0x0AC4 > # GUJARATI VOWEL SIGN VOCALIC RR > 0xE8+0xE8 > 0x0ACD+0x200C > # GUJARATI SIGN VIRAMA + ZWNJ # explicit halant > 0xE8+0xE9 > 0x0ACD+0x200D > # GUJARATI SIGN VIRAMA + ZWJ # soft halant > > So some additional coding (~10-15 minutes) is required to > take in account these pairs. If you find convertion map > for ISCII, we'll add it into next release. > > > TSCII seems to be a very complex charset. > I noticed this reading these two documents: > http://www.xfree86.org/pipermail/i18n/2001-August/002246.html > http://www.geocities.com/Athens/5180/tscii4.html > > If you want to contribute the project implementing TSCII support, > feel free to send us patches. I would recommend to use the > latest CVS sources as a start point, because recoding tools were > changed since 3.2.3. > > Regards! > > > Vathanana Kumarathurai wrote: > > > Hi, > > > > TSCII > > ==== > > Here are few links to TSCII samle texts and fonts needed to see the texts. > > > > TSCII: Fonts, Keyboard Drivers and Converters > > -- http://www.tamil.net/tscii/tools.html > > > > Sample Web Pages in Tamil based on TSCII format > > -- http://www.geocities.com/Athens/5180/tsctst11.html > > -- http://www.geocities.com/Athens/5180/devsngs.html > > -- http://www.geocities.com/Athens/5180/barati1T.html > > > > A website using TSCII > > -- http://www.aaraamthinai.com/ > > > > > > > > ISCII > > ==== > > ISCII is more tricky though... :( I haven't been able to find much > > literature in Tamil based on ISCII. I will get back to you about this later. > > > > Indian Institute of Information Technology has developed an ISCII plug-in > > for the browsers to view the texts in the Indian Script. ISCII Plug-In > > -- http://www.iiit.net/ltrc/iscii/index.htm > > > > > ___________________________________________ > If you want to unsubscribe send "unsubscribe general" > to [EMAIL PROTECTED] > > ___________________________________________ If you want to unsubscribe send "unsubscribe general" to [EMAIL PROTECTED]
