Hi!

Starting from 3.2.0, internally mnogosearch works in Unicode.
To add support for a new character set, one need to
write charset->unicode and unicode->charset convertion
routines. This is very simple for European languages,
which are mostly covered by 8bit character set.
The only thing we need is charset->unicode mapping table
to add a support of simple charset of such kind. Such
tables are available from ftp.unicode.org.

ISCII seems to be a simple (almost!!!) charset in this meanning.
I found a mapping table for MacGujarati charset here,
which is almost the same with ISCII:
http://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/GUJARATI.TXT
Where we can find ISCII mapping table?

Please also note this section in mapping file:

# Section 1: Map the following byte pairs as indicated:
# (ZWNJ means ZERO WIDTH NON-JOINER, ZWJ means ZERO WIDTH JOINER)
# (Also see note about 0xF0 in comments above)

0xA1+0xE9 
0x0AD0 
# GUJARATI OM
0xAA+0xE9 
0x0AE0 
# GUJARATI LETTER VOCALIC RR
0xDF+0xE9 
0x0AC4 
# GUJARATI VOWEL SIGN VOCALIC RR
0xE8+0xE8 
0x0ACD+0x200C 
# GUJARATI SIGN VIRAMA + ZWNJ   # explicit halant
0xE8+0xE9 
0x0ACD+0x200D 
# GUJARATI SIGN VIRAMA + ZWJ    # soft halant

So some additional coding (~10-15 minutes) is required to
take in account these pairs. If you find convertion map
for ISCII, we'll add it into next release.


TSCII seems to be a very complex charset.
I noticed this reading these two documents:
http://www.xfree86.org/pipermail/i18n/2001-August/002246.html
http://www.geocities.com/Athens/5180/tscii4.html

If you want to contribute the project implementing TSCII support,
feel free to send us patches. I would recommend to use the
latest CVS sources as a start point, because recoding tools were
changed since 3.2.3.

Regards!


Vathanana Kumarathurai wrote:

> Hi,
> 
> TSCII
> ====
> Here are few links to TSCII samle texts and fonts needed to see the texts.
> 
> TSCII: Fonts, Keyboard Drivers and Converters
> -- http://www.tamil.net/tscii/tools.html
> 
> Sample Web Pages in Tamil based on TSCII format
>   -- http://www.geocities.com/Athens/5180/tsctst11.html
>   -- http://www.geocities.com/Athens/5180/devsngs.html
>   -- http://www.geocities.com/Athens/5180/barati1T.html
> 
> A website using TSCII
> -- http://www.aaraamthinai.com/
> 
> 
> 
> ISCII
> ====
> ISCII is more tricky though... :( I haven't been able to find much
> literature in Tamil based on ISCII. I will get back to you about this later.
> 
> Indian Institute of Information Technology has developed an ISCII plug-in
> for the browsers to view the texts in the Indian Script. ISCII Plug-In
> -- http://www.iiit.net/ltrc/iscii/index.htm




___________________________________________
If you want to unsubscribe send "unsubscribe general"
to [EMAIL PROTECTED]

Reply via email to