so wat should we do first?? getting an E-dictionary is the basic step rite??
On Jul 27, 9:28 pm, Varewoolf <[email protected]> wrote: > i have read these mails.. oops I dont have these much knowledge abt > MT and corpus etc thing.. but i am more ready to do any volunteer work > to make this happen. i have a good command overMalayalam andEnglish..so how > could be this translation actually work ?? > show me the path, i will walk through.. > > On Mon, Jul 27, 2009 at 12:23 PM, JAGANADH G<[email protected]> wrote: > > > On Mon, Jul 27, 2009 at 11:33 AM, jinesh kj <[email protected]> wrote: > > >> hi all, > > >> Machine Translation is one of the toughest Language computing problems and > >> newer ideas and thoughts are coming up every year. Ministry of > >> Communication > >> Information Technology is spending lot of money on the project(along with > >> some other projects). M.T. System forMalayalamis being developed by Tamil > >> University, Tanchavoor. From what i understand, they are using a corpus > >> based approach, tailored for a set of sentences than a generic algorithm. > > > Ya I know this. Thanjavoor people are working onTamil<->Malayalammachine > > translation. They are customizing the anusaarak approach developed by > > Aksharbharatigroup. That system is a language acquistion system that MT (In > > the original developers view). The system algo has its own advantages and > > limitations. A group of C-DAC people are also nvolved inEnglishto Indian > > languages (IncludingMalayalam). I dont know any of these systems are Open > > Or Not. So why I was not mentioning the name. > > >> When i talked to a friend, he pointed out somethings like, we need to > >> think of the deviations from base grammer rules, when designing a system > >> for > >> real translation. I think whatever we do, translation process will remain > >> same(remove all agglutination, identify key words, their POS and using that > >> information, translate). Sandhi splitting and POS tagging are the important > >> steps to tackle in my view. > > > More clearly Sourcelanguage Sentence -> Parsing(For pattern Identification) > > -> Convert to target language Syntactic pattern --> Taget Language Text > > generation . This is the broad block view of MT system. Whether POS tagger > > should be there depend your design. > > The harder part in Indian Language to Indian Language (from my experience) > > is Morphological Analysis as well as Sandhi splitting. Some sort of > > heuristics is required for Sandhi splitting. Computing Kerala Paniniyam will > > not solve the problem Even for Sanskrit extensive Sandhi rules are there. > > But people who engaged in Sanskrit Computing calls it as a baffling > > problem.Sandhi Splitter is a required component in Morphological analyzer > > and Morphological analyzer requires a Sandhi splitter (A kind of ded lock). > > >> May be Jagan, Santhosh Rajeev and all can add more to this. From what i > >> understand, a normal rules based system wont work that well formalayalam > >> since rules are not much followed in the normal writing scheme(both are > >> right kind of approach). > > > If some body really interested we can build a small system with in one year. > > I will tell the plan with in a day or two. > > >> cheers > > >> Jinesh K J > > >> On Mon, Jul 27, 2009 at 10:26 AM, JAGANADH G <[email protected]> wrote: > > >>> If you are really interested drop me a mail. Are you familier with Perl > >>> programming ? > > >>> On Sun, Jul 26, 2009 at 10:29 PM, Varewoolf <[email protected]> wrote: > > >>>> so wat might be the next step?? > > >>>> On Sat, Jul 25, 2009 at 10:31 AM, JAGANADH G<[email protected]> wrote: > > >>>> > On Sat, Jul 25, 2009 at 12:41 AM, Rajeev J Sebastian > >>>> > <[email protected]> wrote: > > >>>> >> On Fri, Jul 24, 2009 at 7:02 PM, JAGANADH G<[email protected]> > >>>> >> wrote: > > >>>> >> > On Fri, Jul 24, 2009 at 5:29 PM, Rajeev J Sebastian > >>>> >> > <[email protected]> wrote: > > >>>> >> >> On Fri, Jul 24, 2009 at 5:19 PM, Varewoolf<[email protected]> > >>>> >> >> wrote: > > >>>> >> >> > i am so much interested to make this happen... i am always > >>>> >> >> > interested > >>>> >> >> > in linguistics... > >>>> >> >> > anybody tell me wat r the things we need primarily?? > > >>>> >> >> How about ... > > >>>> >> >> 1) 50+ years of research (actually, 2000 if you consider Panini) > > >>>> >> > It is history ? If you can work hard you can reduce the zero from > >>>> >> > it. > > >>>> >> Huh ? > > >>>> >> >> 2) Extremely large corpus ... if you want to make a practical > >>>> >> >> system > > >>>> >> > Only if you adopt copus based model. That is not going to practical > >>>> >> > in > >>>> >> > right > >>>> >> > now in the case ofEnglishtoMalayalamtranslation > > >>>> >> It is not practical to make *anything* without a corpus. Even if you > >>>> >> use a non-corpus based methodology to perform translation, you still > >>>> >> need a large corpus to *validate* that your method works for more > >>>> >> than > >>>> >> toy examples. This is the biggest problem that faces any NLP work for > >>>> >> Indic languages, and one that some glorified institutions in India > >>>> >> neither builds up nor shares, most probably because all their systems > >>>> >> are capable of are translating toy examples. > > >>>> > I know that thre are non -free systems under dvevelopment which is > >>>> > more > >>>> > advanced that Google translate service(EnglishHindi). But when they > >>>> > will > >>>> > relese it I dont know. > > >>>> >> >> 3) Large and talented team good in computational linguistics > > >>>> >> > Where is it? We can build up this > > >>>> >> Best of Luck. > > >>>> >> >> 4) a very practical theory that can model language effectively for > >>>> >> >> your purposes (seriously lacking for even small use cases in even > >>>> >> >> major languages) > > >>>> >> > A perfect grammar forMalayalamis required. Especially in Sysntax > >>>> >> > and > >>>> >> > Morphology.Malayalamreally lacks such studies. > > >>>> >> I don't think any language has such an in-depth model that could be > >>>> >> used for generic MT. There are of course, special case models ... > >>>> >> which can be used for special cases. > > >>>> > The Sanskrit grammar is a perfect model. > > >>>> >> >> 5) since you want to do MT, you need one more theory to handle the > >>>> >> >> target language ... maybe even an IL model if you go that route > >>>> >> >> instead of direct translation. > > >>>> >> > First of all we need a goodEnglishtoMalayalamdict in e-format. > >>>> >> > Which > >>>> >> > gives excat meaning POS, etc. Not like one saying Science - > >>>> >> > ശാസ്ത്രം, > >>>> >> > തര്ക്കശാസ്ത്രം like. > > >>>> >> POS tagged dataset is just one component of a complete corpus. > > >>>> > POS Tagged corpus is a variety of corpus. > > >>>> >> Regards > >>>> >> Rajeev J Sebastian > > >>>> > -- > >>>> > ********************************** > >>>> > JAGANADH G > >>>> >http://jaganadhg.freeflux.net/blog > > >>> -- > >>> ********************************** > >>> JAGANADH G > >>>http://jaganadhg.freeflux.net/blog > > >> -- > >> My Feelings,Expressions- > >>http://logbookofanobserver.blogspot.com > > >> My scribblings- > >>http://logbookofanobserver.wordpress.com > > >> SMC : My computer, My languagehttp://smc.org.in > >> സ്വതന്ത്ര മലയാളം കമ്പ്യൂട്ടിങ്ങ്, എന്റെ കമ്പ്യൂട്ടറിന് എന്റെ ഭാഷ > > > -- > > ********************************** > > JAGANADH G > >http://jaganadhg.freeflux.net/blog --~--~---------~--~----~------------~-------~--~----~ "Freedom is the only law". "Freedom Unplugged" http://www.ilug-tvm.org You received this message because you are subscribed to the Google Groups "ilug-tvm" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For details visit the website: www.ilug-tvm.org or the google group page: http://groups.google.com/group/ilug-tvm?hl=en -~----------~----~----~----~------~----~------~--~---
