[fsug-tvm] Re: english malayalam translator

jeevachaithanyan sivanandan Tue, 28 Jul 2009 10:59:17 -0700

so wat should we do first?? getting an E-dictionary is the basic step
rite??


On Jul 27, 9:28 pm, Varewoolf <[email protected]> wrote:
> i have read these mails.. oops I  dont have these much knowledge abt
> MT and corpus etc thing.. but i am more ready to do any volunteer work
> to make this happen. i have a good command overMalayalam andEnglish..so how 
> could be this translation actually work ??
> show me the path, i will walk through..
>
> On Mon, Jul 27, 2009 at 12:23 PM, JAGANADH G<[email protected]> wrote:
>
> > On Mon, Jul 27, 2009 at 11:33 AM, jinesh kj <[email protected]> wrote:
>
> >> hi all,
>
> >> Machine Translation is one of the toughest Language computing problems and
> >> newer ideas and thoughts are coming up every year. Ministry of 
> >> Communication
> >> Information Technology is spending lot of money on the project(along with
> >> some other projects). M.T. System forMalayalamis being developed by Tamil
> >> University, Tanchavoor. From what i understand, they are using a corpus
> >> based approach, tailored for a set of sentences than a generic algorithm.
>
> > Ya I know this. Thanjavoor people are working onTamil<->Malayalammachine
> > translation. They are customizing the anusaarak approach developed by
> > Aksharbharatigroup. That system is a language acquistion system that MT (In
> > the original developers view). The system algo has its own advantages and
> > limitations. A group of C-DAC people are also nvolved inEnglishto Indian
> > languages (IncludingMalayalam). I dont know any of these systems are Open
> > Or Not. So why I was not mentioning the name.
>
> >> When i talked to a friend, he pointed out somethings like, we need to
> >> think of the deviations from base grammer rules, when designing a system 
> >> for
> >> real translation. I think whatever we do, translation process will remain
> >> same(remove all agglutination, identify key words, their POS and using that
> >> information, translate). Sandhi splitting and POS tagging are the important
> >> steps to tackle in my view.
>
> > More clearly Sourcelanguage Sentence -> Parsing(For pattern Identification)
> > -> Convert to target language Syntactic pattern --> Taget Language Text
> > generation . This is the broad block view of MT system. Whether POS tagger
> > should be there depend your design.
> > The harder part in Indian Language to Indian Language (from my experience)
> > is Morphological Analysis as well as Sandhi splitting. Some sort of
> > heuristics is required for Sandhi splitting. Computing Kerala Paniniyam will
> > not solve the problem Even for Sanskrit extensive Sandhi rules are there.
> > But people who engaged in Sanskrit Computing calls it as a baffling
> > problem.Sandhi Splitter is a required component in Morphological analyzer
> > and Morphological analyzer requires a Sandhi splitter (A kind of ded lock).
>
> >> May be Jagan, Santhosh Rajeev and all can add more to this. From what i
> >> understand, a normal rules based system wont work that well formalayalam
> >> since rules are not much followed in the normal writing scheme(both are
> >> right kind of approach).
>
> > If some body really interested we can build a small system with in one year.
> > I will tell the plan with in a day or two.
>
> >> cheers
>
> >> Jinesh K J
>
> >> On Mon, Jul 27, 2009 at 10:26 AM, JAGANADH G <[email protected]> wrote:
>
> >>> If you are really interested drop me a mail. Are you familier with Perl
> >>> programming ?
>
> >>> On Sun, Jul 26, 2009 at 10:29 PM, Varewoolf <[email protected]> wrote:
>
> >>>> so wat might be the next step??
>
> >>>> On Sat, Jul 25, 2009 at 10:31 AM, JAGANADH G<[email protected]> wrote:
>
> >>>> > On Sat, Jul 25, 2009 at 12:41 AM, Rajeev J Sebastian
> >>>> > <[email protected]> wrote:
>
> >>>> >> On Fri, Jul 24, 2009 at 7:02 PM, JAGANADH G<[email protected]>
> >>>> >> wrote:
>
> >>>> >> > On Fri, Jul 24, 2009 at 5:29 PM, Rajeev J Sebastian
> >>>> >> > <[email protected]> wrote:
>
> >>>> >> >> On Fri, Jul 24, 2009 at 5:19 PM, Varewoolf<[email protected]>
> >>>> >> >> wrote:
>
> >>>> >> >> > i am so much interested to make this happen... i am always
> >>>> >> >> > interested
> >>>> >> >> > in linguistics...
> >>>> >> >> > anybody tell me wat r the things we need primarily??
>
> >>>> >> >> How about ...
>
> >>>> >> >> 1) 50+ years of research (actually, 2000 if you consider Panini)
>
> >>>> >> > It is history ? If you can work hard you can reduce the zero from
> >>>> >> > it.
>
> >>>> >> Huh ?
>
> >>>> >> >> 2) Extremely large corpus ... if you want to make a practical
> >>>> >> >> system
>
> >>>> >> > Only if you adopt copus based model. That is not going to practical
> >>>> >> > in
> >>>> >> > right
> >>>> >> > now in the case ofEnglishtoMalayalamtranslation
>
> >>>> >> It is not practical to make *anything* without a corpus. Even if you
> >>>> >> use a non-corpus based methodology to perform translation, you still
> >>>> >> need a large corpus to *validate* that your method works for more
> >>>> >> than
> >>>> >> toy examples. This is the biggest problem that faces any NLP work for
> >>>> >> Indic languages, and one that some glorified institutions in India
> >>>> >> neither builds up nor shares, most probably because all their systems
> >>>> >> are capable of are translating toy examples.
>
> >>>> > I know that thre are non -free systems under dvevelopment which is
> >>>> > more
> >>>> > advanced that Google translate service(EnglishHindi). But when they
> >>>> > will
> >>>> > relese it I dont know.
>
> >>>> >> >> 3) Large and talented team good in computational linguistics
>
> >>>> >> > Where is it? We can build up this
>
> >>>> >> Best of Luck.
>
> >>>> >> >> 4) a very practical theory that can model language effectively for
> >>>> >> >> your purposes (seriously lacking for even small use cases in even
> >>>> >> >> major languages)
>
> >>>> >> > A perfect grammar forMalayalamis required. Especially in Sysntax
> >>>> >> > and
> >>>> >> > Morphology.Malayalamreally lacks such studies.
>
> >>>> >> I don't think any language has such an in-depth model that could be
> >>>> >> used for generic MT. There are of course, special case models ...
> >>>> >> which can be used for special cases.
>
> >>>> > The Sanskrit grammar is a perfect model.
>
> >>>> >> >> 5) since you want to do MT, you need one more theory to handle the
> >>>> >> >> target language ... maybe even an IL model if you go that route
> >>>> >> >> instead of direct translation.
>
> >>>> >> > First of all we need a goodEnglishtoMalayalamdict in e-format.
> >>>> >> > Which
> >>>> >> > gives excat meaning POS, etc. Not like one saying Science -
> >>>> >> > ശാസ്ത്രം,
> >>>> >> > തര്‍ക്കശാസ്ത്രം like.
>
> >>>> >> POS tagged dataset is just one component of a complete corpus.
>
> >>>> > POS Tagged corpus is a variety of corpus.
>
> >>>> >> Regards
> >>>> >> Rajeev J Sebastian
>
> >>>> > --
> >>>> > **********************************
> >>>> > JAGANADH G
> >>>> >http://jaganadhg.freeflux.net/blog
>
> >>> --
> >>> **********************************
> >>> JAGANADH G
> >>>http://jaganadhg.freeflux.net/blog
>
> >> --
> >> My Feelings,Expressions-
> >>http://logbookofanobserver.blogspot.com
>
> >> My scribblings-
> >>http://logbookofanobserver.wordpress.com
>
> >> SMC : My computer, My languagehttp://smc.org.in
> >> സ്വതന്ത്ര മലയാളം കമ്പ്യൂട്ടിങ്ങ്, എന്റെ കമ്പ്യൂട്ടറിന് എന്റെ ഭാഷ
>
> > --
> > **********************************
> > JAGANADH G
> >http://jaganadhg.freeflux.net/blog

--~--~---------~--~----~------------~-------~--~----~
"Freedom is the only law". 
"Freedom Unplugged"
http://www.ilug-tvm.org

You received this message because you are subscribed to the Google
Groups "ilug-tvm" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]

For details visit the website: www.ilug-tvm.org or the google group page: 
http://groups.google.com/group/ilug-tvm?hl=en
-~----------~----~----~----~------~----~------~--~---

[fsug-tvm] Re: english malayalam translator

Reply via email to