Hi,

> Why don't you start by creating a list of currently available Bangla 
> dictionaries in print, followed by older, out-of-print ones.  If you 
> are really serious I can try to contact Bangla experts at the Asiatic 
> Society to help w/ preliminary research.

Robin might have some ground work on this, and thank you very much for the
offer !

> 
> The problem with a 'small' dictionary is that people will get turned off 
>   very quickly, once they discover that it does not contain words they 
> are looking for and stop consulting it forever.  Hence the dictionary 
> has to nearly complete before it is unveiled and advertised to the 
> public-at-large.

Sure, I think it'd still be nice to start out with a web based interactive
system (with restricted access for writes) so people can see the progress,
contribute words/meanings and get interested in general. 

> 
> A more modest, but still highly useful, effort may be to start with 
> creating a list of "all" Bangla words.  This could be used for 
> spell-checkers and could, in the future, be the basis for creating the 
> dict.  Even this is a daunting task.

So Barda (avijit das, <[EMAIL PROTECTED]>) has a list of words with his
bengali writer distribution, for his spell checker. There are 112,943
words in that list encoded in ISCII, barda's bengali writer format and one
other format which I can't figure out right now. I don't know how many of
those words are "duplicate" ie noun/verb forms etc. 

I'm going to try to convert the IISC part to unicode (utf-8) (there are
code snippets out there which do this) and that should be a good spring
board. (My plan right now is to tweak barda's list and his spell check
algorithm to run on Lekho)

> 
> If you have set-up the system for typing in Bangla - I strongly 
> recommend that you type in a sizeable number of words (100?), and time 
> yourself to see how long it takes.  Do the same test w/ dict. entries. 
> This would give you a more accurate basis for estimates on total time.
>
Absolutely. I'll mail the people at the Digital South Asia library and see
if they have data on this, if the are going to publicise their work etc.

http://dsal.uchicago.edu/contact.html
 
> True.  I hope that you muster the will to stay with it.  But if you fail 
> to understand the difficulty and enormity of what you are undertaking 
> you will get discouraged once the reality hits and may abandon the 
> effort half-way.  This may then negatively influence others  - hence my 
> early warnings.

Hopefully it'll be like other OS projects where who starts it is
irrelevant, as long as people decide the project is useful, the project
will be there.

> 
> But I earnestly hope that you will take this on and we will, several 
> years from now, ba able to name Kaushik Ghose as the author of the first 
> OS Bangla cyber-dictionary.

Now that part is unlikely to happen. If the experiment succeeds there'll
be a whole lot of authors on that list.... 

> 
> [ There is a OS OCR called Clara - however it is geared towards roman 
> scripts and will need extensive fiddling to get it to understand any 
> Bangla font.  But about half the code, that used to align the page, line 
> and word can be reused - just the hard part - recognizingindividual 
> characters will need to be re-written.  If you are *really* into C 
> programming - you could look into it - that is what I would do - if I 
> had the time. ]
> 
I played with OCR during part of my master's education, but right now I've
switched fields completely, this is one thing I'm NOT going to take up,
I'm pretty sure, really, honestly...


take care ! 
-kaushik



--
To unsubscribe, send mail to [EMAIL PROTECTED] with the body
"unsubscribe ilug-cal" and an empty subject line.
FAQ: http://www.ilug-cal.org/help/faq_list.html

Reply via email to