---------- Forwarded message ---------- Date: Mon, 30 Sep 2002 12:35:20 +0530 From: Dr Abhijit Das <[EMAIL PROTECTED]> To: Kaushik Ghose <[EMAIL PROTECTED]> Subject: Re: [ilug-cal] dictionary project > So Barda (avijit das, <[EMAIL PROTECTED]>) has a list of words with his > bengali writer distribution, for his spell checker. There are 112,943 > words in that list encoded in ISCII, barda's bengali writer format and one > other format which I can't figure out right now. I don't know how many of > those words are "duplicate" ie noun/verb forms etc. No words are duplicated. Different parts of speech are written as 0 (Noun), 1 (Adjective & Adverb), 2 (Pronoun), 3 (abyay), 4 (verb) in the line of a word. But different verb forms (kori, korun, koruk, korchhilo etc.) are listed separately. Without these verb forms the dictionary size is about 50,000 words. But the greatest danger in using this dictionary is that it has not been spell-checked yet! Since there are (apparently) no other databases, this has to be done manually! A pain. I sincerely appreciate volunteers' efforts in this regard. > I'm going to try to convert the IISC part to unicode (utf-8) (there are > code snippets out there which do this) and that should be a good spring > board. (My plan right now is to tweak barda's list and his spell check > algorithm to run on Lekho) The latest version (3.0) of bwedit has a doc directory. One file in that directory describes the ISCII encoding of the Bengali alphabet in explicit details. There is a second document on the spell-checking algorithm. If you read these, there will be absolutely no problem in porting the database/spell-checker to any other system (like Lekho). But the essential problem remains: spell-checking the spell-checker!!! Dr Abhijit Das Monday September 30 2002 12:26 PM (IST) +-----------------------------------------------------------+ | Dr Abhijit Das | | Visiting Faculty, Department of Mathematics | | Indian Institute of Technology, Kanpur 208 016, UP, India | | Phone: +91-512-597753 (off), +91-512-598334 (res) | | E-mail: [EMAIL PROTECTED], [EMAIL PROTECTED] | | URL: http://home.iitk.ac.in/~abhijit/ | +-----------------------------------------------------------+ -- To unsubscribe, send mail to [EMAIL PROTECTED] with the body "unsubscribe ilug-cal" and an empty subject line. FAQ: http://www.ilug-cal.org/help/faq_list.html
