Re: [smc-discuss] Re: [fsug-tvm] Re: english malayalam translator

Varewoolf Mon, 27 Jul 2009 09:28:09 -0700

i have read these mails.. oops I  dont have these much knowledge abt
MT and corpus etc thing.. but i am more ready to do any volunteer work
to make this happen. i have a good command over Malayalam  and
English..so how could be this translation actually work ??
show me the path, i will walk through..


On Mon, Jul 27, 2009 at 12:23 PM, JAGANADH G<[email protected]> wrote:
>
>
> On Mon, Jul 27, 2009 at 11:33 AM, jinesh kj <[email protected]> wrote:
>>
>> hi all,
>>
>> Machine Translation is one of the toughest Language computing problems and
>> newer ideas and thoughts are coming up every year. Ministry of Communication
>> Information Technology is spending lot of money on the project(along with
>> some other projects). M.T. System for Malayalam is being developed by Tamil
>> University, Tanchavoor. From what i understand, they are using a corpus
>> based approach, tailored for a set of sentences than a generic algorithm.
>
> Ya I know this. Thanjavoor people are working onTamil<-> Malayalam machine
> translation. They are customizing the anusaarak approach developed by
> Aksharbharatigroup. That system is a language acquistion system that MT (In
> the original developers view). The system algo has its own advantages and
> limitations. A group of C-DAC people are also nvolved in English to Indian
> languages (Including Malayalam). I dont know any of these systems are Open
> Or Not. So why I was not mentioning the name.
>
>>
>> When i talked to a friend, he pointed out somethings like, we need to
>> think of the deviations from base grammer rules, when designing a system for
>> real translation. I think whatever we do, translation process will remain
>> same(remove all agglutination, identify key words, their POS and using that
>> information, translate). Sandhi splitting and POS tagging are the important
>> steps to tackle in my view.
>
> More clearly Sourcelanguage Sentence -> Parsing(For pattern Identification)
> -> Convert to target language Syntactic pattern --> Taget Language Text
> generation . This is the broad block view of MT system. Whether POS tagger
> should be there depend your design.
> The harder part in Indian Language to Indian Language (from my experience)
> is Morphological Analysis as well as Sandhi splitting. Some sort of
> heuristics is required for Sandhi splitting. Computing Kerala Paniniyam will
> not solve the problem Even for Sanskrit extensive Sandhi rules are there.
> But people who engaged in Sanskrit Computing calls it as a baffling
> problem.Sandhi Splitter is a required component in Morphological analyzer
> and Morphological analyzer requires a Sandhi splitter (A kind of ded lock).
>>
>> May be Jagan, Santhosh Rajeev and all can add more to this. From what i
>> understand, a normal rules based system wont work that well for malayalam
>> since rules are not much followed in the normal writing scheme(both are
>> right kind of approach).
>
> If some body really interested we can build a small system with in one year.
> I will tell the plan with in a day or two.
>
>>
>> cheers
>>
>> Jinesh K J
>>
>> On Mon, Jul 27, 2009 at 10:26 AM, JAGANADH G <[email protected]> wrote:
>>>
>>> If you are really interested drop me a mail. Are you familier with Perl
>>> programming ?
>>>
>>> On Sun, Jul 26, 2009 at 10:29 PM, Varewoolf <[email protected]> wrote:
>>>>
>>>> so wat might be the next step??
>>>>
>>>> On Sat, Jul 25, 2009 at 10:31 AM, JAGANADH G<[email protected]> wrote:
>>>> >
>>>> >
>>>> > On Sat, Jul 25, 2009 at 12:41 AM, Rajeev J Sebastian
>>>> > <[email protected]> wrote:
>>>> >>
>>>> >> On Fri, Jul 24, 2009 at 7:02 PM, JAGANADH G<[email protected]>
>>>> >> wrote:
>>>> >> >
>>>> >> >
>>>> >> > On Fri, Jul 24, 2009 at 5:29 PM, Rajeev J Sebastian
>>>> >> > <[email protected]> wrote:
>>>> >> >>
>>>> >> >> On Fri, Jul 24, 2009 at 5:19 PM, Varewoolf<[email protected]>
>>>> >> >> wrote:
>>>> >> >> >
>>>> >> >> > i am so much interested to make this happen... i am always
>>>> >> >> > interested
>>>> >> >> > in linguistics...
>>>> >> >> > anybody tell me wat r the things we need primarily??
>>>> >> >>
>>>> >> >> How about ...
>>>> >> >>
>>>> >> >> 1) 50+ years of research (actually, 2000 if you consider Panini)
>>>> >> >
>>>> >> > It is history ? If you can work hard you can reduce the zero from
>>>> >> > it.
>>>> >>
>>>> >> Huh ?
>>>> >>
>>>> >> >>
>>>> >> >> 2) Extremely large corpus ... if you want to make a practical
>>>> >> >> system
>>>> >> >
>>>> >> > Only if you adopt copus based model. That is not going to practical
>>>> >> > in
>>>> >> > right
>>>> >> > now in the case of English to Malayalam translation
>>>> >>
>>>> >> It is not practical to make *anything* without a corpus. Even if you
>>>> >> use a non-corpus based methodology to perform translation, you still
>>>> >> need a large corpus to *validate* that your method works for more
>>>> >> than
>>>> >> toy examples. This is the biggest problem that faces any NLP work for
>>>> >> Indic languages, and one that some glorified institutions in India
>>>> >> neither builds up nor shares, most probably because all their systems
>>>> >> are capable of are translating toy examples.
>>>> >
>>>> > I know that thre are non -free systems under dvevelopment which is
>>>> > more
>>>> > advanced that Google translate service(English Hindi). But when they
>>>> > will
>>>> > relese it I dont know.
>>>> >
>>>> >>
>>>> >> >>
>>>> >> >> 3) Large and talented team good in computational linguistics
>>>> >> >
>>>> >> > Where is it? We can build up this
>>>> >>
>>>> >> Best of Luck.
>>>> >>
>>>> >> >>
>>>> >> >> 4) a very practical theory that can model language effectively for
>>>> >> >> your purposes (seriously lacking for even small use cases in even
>>>> >> >> major languages)
>>>> >> >
>>>> >> > A perfect grammar for Malayalam is required. Especially in Sysntax
>>>> >> > and
>>>> >> > Morphology. Malayalam really lacks such studies.
>>>> >>
>>>> >> I don't think any language has such an in-depth model that could be
>>>> >> used for generic MT. There are of course, special case models ...
>>>> >> which can be used for special cases.
>>>> >
>>>> > The Sanskrit grammar is a perfect model.
>>>> >
>>>> >>
>>>> >> >>
>>>> >> >> 5) since you want to do MT, you need one more theory to handle the
>>>> >> >> target language ... maybe even an IL model if you go that route
>>>> >> >> instead of direct translation.
>>>> >> >
>>>> >> > First of all we need a good English to Malayalam dict in e-format.
>>>> >> > Which
>>>> >> > gives excat meaning POS, etc. Not like one saying Science -
>>>> >> > ശാസ്ത്രം,
>>>> >> > തര്‍ക്കശാസ്ത്രം like.
>>>> >>
>>>> >> POS tagged dataset is just one component of a complete corpus.
>>>> >
>>>> > POS Tagged corpus is a variety of corpus.
>>>> >
>>>> >>
>>>> >> Regards
>>>> >> Rajeev J Sebastian
>>>> >>
>>>> >>
>>>> >
>>>> >
>>>> >
>>>> > --
>>>> > **********************************
>>>> > JAGANADH G
>>>> > http://jaganadhg.freeflux.net/blog
>>>> >
>>>> > >
>>>> >
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> **********************************
>>> JAGANADH G
>>> http://jaganadhg.freeflux.net/blog
>>>
>>>
>>
>>
>>
>> --
>> My Feelings,Expressions-
>> http://logbookofanobserver.blogspot.com
>>
>> My scribblings-
>> http://logbookofanobserver.wordpress.com
>>
>> SMC : My computer, My language http://smc.org.in
>> സ്വതന്ത്ര മലയാളം കമ്പ്യൂട്ടിങ്ങ്, എന്റെ കമ്പ്യൂട്ടറിന് എന്റെ ഭാഷ
>>
>>
>
>
>
> --
> **********************************
> JAGANADH G
> http://jaganadhg.freeflux.net/blog
>
> >
>

--~--~---------~--~----~------------~-------~--~----~
"Freedom is the only law". 
"Freedom Unplugged"
http://www.ilug-tvm.org

You received this message because you are subscribed to the Google
Groups "ilug-tvm" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]

For details visit the website: www.ilug-tvm.org or the google group page: 
http://groups.google.com/group/ilug-tvm?hl=en
-~----------~----~----~----~------~----~------~--~---

Re: [smc-discuss] Re: [fsug-tvm] Re: english malayalam translator

Reply via email to