Hi, I have finally found the time to write up various aspects of work on improving the performance for aspell for Hindi, the idea being that this would serve as a basis for doing the same in other Indian languages. Here are pointers to some such links:
1. <http://buckycat.wordpress.com/2007/04/28/sorcery-in-indian-language-spelling/> A blog entry on Indian-language spell-checking, including a table demonstrating that with the addition of phonetic rules, aspell performance in Hindi is at least on par with that for English. 2. http://cmwiki.sarai.net/index.php/SpellCheck Some notes on open-source spell-checkers, focussing in particular on how aspell works 3. http://cmwiki.sarai.net/index.php/PhoneticDetails A writeup on details of how the phonetic rules for Hindi work, and notes on how to adapt these for other Indian languages. A link to the Hindi phonetic rules files is also provided, and I will soon roll this into the official aspell Hindi dictionary. I would like volunteers to help out in the following tasks: (a) Review the phonetic rules described in 3, and come up with more. (b) Adapt these rules to other languages. I will probably do Oriya, and work in Punjabi is under way. Kartik has volunteered to do Gujarati, Hari Prasad Kannada, and we have a Sarai FLOSS fellowship proposal to do Marathi (among other things). (c) Vastly improve existing spell-checking dictionaries, as these rules are not of much use without an adequate dictionary. One way is to have someone type in dictionaries that are out of copyright. (d) Help out in adding more aspell rules. Another potential area of great improvement is by adding affix (prefix/suffix) rules. This is best done along with preparing a new dictionary. (e) I will soon put up project plans for various aspell work, including plugins for Scribus, OpenOffice, and the Mozilla suite, bindings in other programming languages, and a stand-alone GUI front-end to aspell. Programmers, please help out here. Regards, Gora _______________________________________________ ilugd mailinglist -- [email protected] http://frodo.hserus.net/mailman/listinfo/ilugd Archives at: http://news.gmane.org/gmane.user-groups.linux.delhi http://www.mail-archive.com/[email protected]/
