On Sat, Jul 25, 2009 at 12:41 AM, Rajeev J Sebastian < [email protected]> wrote:
> > On Fri, Jul 24, 2009 at 7:02 PM, JAGANADH G<[email protected]> wrote: > > > > > > On Fri, Jul 24, 2009 at 5:29 PM, Rajeev J Sebastian > > <[email protected]> wrote: > >> > >> On Fri, Jul 24, 2009 at 5:19 PM, Varewoolf<[email protected]> wrote: > >> > > >> > i am so much interested to make this happen... i am always interested > >> > in linguistics... > >> > anybody tell me wat r the things we need primarily?? > >> > >> How about ... > >> > >> 1) 50+ years of research (actually, 2000 if you consider Panini) > > > > It is history ? If you can work hard you can reduce the zero from it. > > Huh ? > > >> > >> 2) Extremely large corpus ... if you want to make a practical system > > > > Only if you adopt copus based model. That is not going to practical in > right > > now in the case of English to Malayalam translation > > It is not practical to make *anything* without a corpus. Even if you > use a non-corpus based methodology to perform translation, you still > need a large corpus to *validate* that your method works for more than > toy examples. This is the biggest problem that faces any NLP work for > Indic languages, and one that some glorified institutions in India > neither builds up nor shares, most probably because all their systems > are capable of are translating toy examples. I know that thre are non -free systems under dvevelopment which is more advanced that Google translate service(English Hindi). But when they will relese it I dont know. > > > >> > >> 3) Large and talented team good in computational linguistics > > > > Where is it? We can build up this > > Best of Luck. > > >> > >> 4) a very practical theory that can model language effectively for > >> your purposes (seriously lacking for even small use cases in even > >> major languages) > > > > A perfect grammar for Malayalam is required. Especially in Sysntax and > > Morphology. Malayalam really lacks such studies. > > I don't think any language has such an in-depth model that could be > used for generic MT. There are of course, special case models ... > which can be used for special cases. > The Sanskrit grammar is a perfect model. > > >> > >> 5) since you want to do MT, you need one more theory to handle the > >> target language ... maybe even an IL model if you go that route > >> instead of direct translation. > > > > First of all we need a good English to Malayalam dict in e-format. Which > > gives excat meaning POS, etc. Not like one saying Science - ശാസ്ത്രം, > > തര്ക്കശാസ്ത്രം like. > > POS tagged dataset is just one component of a complete corpus. > POS Tagged corpus is a variety of corpus. > > Regards > Rajeev J Sebastian > > > > -- ********************************** JAGANADH G http://jaganadhg.freeflux.net/blog --~--~---------~--~----~------------~-------~--~----~ "Freedom is the only law". "Freedom Unplugged" http://www.ilug-tvm.org You received this message because you are subscribed to the Google Groups "ilug-tvm" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For details visit the website: www.ilug-tvm.org or the google group page: http://groups.google.com/group/ilug-tvm?hl=en -~----------~----~----~----~------~----~------~--~---
