Kaushik Ghose wrote:

> That's a great idea, anyone know anybody in the dictionary business with
> words in digital format ?

Why don't you start by creating a list of currently available Bangla 
dictionaries in print, followed by older, out-of-print ones.  If you 
are really serious I can try to contact Bangla experts at the Asiatic 
Society to help w/ preliminary research.

> Well, I'd suggest starting on a more modest basis like 5000 commonly used
> words.

The problem with a 'small' dictionary is that people will get turned off 
  very quickly, once they discover that it does not contain words they 
are looking for and stop consulting it forever.  Hence the dictionary 
has to nearly complete before it is unveiled and advertised to the 
public-at-large.

A more modest, but still highly useful, effort may be to start with 
creating a list of "all" Bangla words.  This could be used for 
spell-checkers and could, in the future, be the basis for creating the 
dict.  Even this is a daunting task.

If you have set-up the system for typing in Bangla - I strongly 
recommend that you type in a sizeable number of words (100?), and time 
yourself to see how long it takes.  Do the same test w/ dict. entries. 
This would give you a more accurate basis for estimates on total time.

> Probably not just 3 or four people, but as a community effort it might be
> a different story.
> 
> So an estimate for a web contributed dictionary would go
> like this
> 
> 20 hits (users) a day
> 2 words per user (user fills out translated word, perhaps meaning and
> thats about it)
> 40 * 365 = 14,600 words/year , clearly impossible - or ?

In my opinion, that is vastly optimistic.  But I have no way of getting 
to more accurate estimates.

If you are serious - think of the list of project participants as 1 - 
yourself.  At best, you will get a few persons (like me) to help you 
from time to time without taking up too much responsibility.  If you are 
very, very lucky you might get one or two other serious participants - 
but only after you have proven that the dictionary/word-list is an 
on-going project.

> 
> You never know until you try...

True.  I hope that you muster the will to stay with it.  But if you fail 
to understand the difficulty and enormity of what you are undertaking 
you will get discouraged once the reality hits and may abandon the 
effort half-way.  This may then negatively influence others  - hence my 
early warnings.

But I earnestly hope that you will take this on and we will, several 
years from now, ba able to name Kaushik Ghose as the author of the first 
OS Bangla cyber-dictionary.

[ There is a OS OCR called Clara - however it is geared towards roman 
scripts and will need extensive fiddling to get it to understand any 
Bangla font.  But about half the code, that used to align the page, line 
and word can be reused - just the hard part - recognizingindividual 
characters will need to be re-written.  If you are *really* into C 
programming - you could look into it - that is what I would do - if I 
had the time. ]

-- 
Raja Guha
---------



--
To unsubscribe, send mail to [EMAIL PROTECTED] with the body
"unsubscribe ilug-cal" and an empty subject line.
FAQ: http://www.ilug-cal.org/help/faq_list.html

Reply via email to