Re: Support for TSCII (Tamil charset)

Vathanan Kumarathurai Tue, 02 Apr 2002 00:24:37 -0800

Hi,

I found the following in unicode group at Yahoo Groups referring to ICU
implementation of ISCII <-> unicode conversion:
http://groups.yahoo.com/group/unicode/message/11649


  " ISCII is algorithmic. The mapping part to/from Unicode is fairly
  straightforward because Unicode's encoding of Indic scripts is based on an
earlier version of ISCII.

  For details take a look at the source code:

http://oss.software.ibm.com/cvs/icu/~checkout~/icu/source/common/ucnvisci.c
"


I will get back to you regarding TSCII conversion.

/vathanan
p.s. sorry for the late reply .. i was on vacation :(

----- Original Message -----
From: "Alexander Barkov" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>; "Vathanana Kumarathurai"
<[EMAIL PROTECTED]>
Sent: Tuesday, March 26, 2002 11:44 PM
Subject: Re: Support for TSCII (Tamil charset)


> Hi!
>
> Starting from 3.2.0, internally mnogosearch works in Unicode.
> To add support for a new character set, one need to
> write charset->unicode and unicode->charset convertion
> routines. This is very simple for European languages,
> which are mostly covered by 8bit character set.
> The only thing we need is charset->unicode mapping table
> to add a support of simple charset of such kind. Such
> tables are available from ftp.unicode.org.
>
> ISCII seems to be a simple (almost!!!) charset in this meanning.
> I found a mapping table for MacGujarati charset here,
> which is almost the same with ISCII:
> http://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/GUJARATI.TXT
> Where we can find ISCII mapping table?
>
> Please also note this section in mapping file:
>
> # Section 1: Map the following byte pairs as indicated:
> # (ZWNJ means ZERO WIDTH NON-JOINER, ZWJ means ZERO WIDTH JOINER)
> # (Also see note about 0xF0 in comments above)
>
> 0xA1+0xE9
> 0x0AD0
> # GUJARATI OM
> 0xAA+0xE9
> 0x0AE0
> # GUJARATI LETTER VOCALIC RR
> 0xDF+0xE9
> 0x0AC4
> # GUJARATI VOWEL SIGN VOCALIC RR
> 0xE8+0xE8
> 0x0ACD+0x200C
> # GUJARATI SIGN VIRAMA + ZWNJ # explicit halant
> 0xE8+0xE9
> 0x0ACD+0x200D
> # GUJARATI SIGN VIRAMA + ZWJ # soft halant
>
> So some additional coding (~10-15 minutes) is required to
> take in account these pairs. If you find convertion map
> for ISCII, we'll add it into next release.
>
>
> TSCII seems to be a very complex charset.
> I noticed this reading these two documents:
> http://www.xfree86.org/pipermail/i18n/2001-August/002246.html
> http://www.geocities.com/Athens/5180/tscii4.html
>
> If you want to contribute the project implementing TSCII support,
> feel free to send us patches. I would recommend to use the
> latest CVS sources as a start point, because recoding tools were
> changed since 3.2.3.
>
> Regards!
>
>
> Vathanana Kumarathurai wrote:
>
> > Hi,
> >
> > TSCII
> > ====
> > Here are few links to TSCII samle texts and fonts needed to see the
texts.
> >
> > TSCII: Fonts, Keyboard Drivers and Converters
> > -- http://www.tamil.net/tscii/tools.html
> >
> > Sample Web Pages in Tamil based on TSCII format
> >   -- http://www.geocities.com/Athens/5180/tsctst11.html
> >   -- http://www.geocities.com/Athens/5180/devsngs.html
> >   -- http://www.geocities.com/Athens/5180/barati1T.html
> >
> > A website using TSCII
> > -- http://www.aaraamthinai.com/
> >
> >
> >
> > ISCII
> > ====
> > ISCII is more tricky though... :( I haven't been able to find much
> > literature in Tamil based on ISCII. I will get back to you about this
later.
> >
> > Indian Institute of Information Technology has developed an ISCII
plug-in
> > for the browsers to view the texts in the Indian Script. ISCII Plug-In
> > -- http://www.iiit.net/ltrc/iscii/index.htm
>
>
>
>
> ___________________________________________
> If you want to unsubscribe send "unsubscribe general"
> to [EMAIL PROTECTED]
>
>

___________________________________________
If you want to unsubscribe send "unsubscribe general"
to [EMAIL PROTECTED]

Re: Support for TSCII (Tamil charset)

Reply via email to