Karan,

Parallel data such as the resources you mention are of little use for Apertium, a rule-based system.

Is the transfer grammar mentioned free/open-source? Could it be turned into Apertium rules?

Come back when you have a clearer idea!

Cheers

Mikel

Al 02/28/2014 05:42 PM, En/na karan singla ha escrit:
Hello all,

I am Karan SIngla, pursuing BTech in CSE and MS in Computational Linguistics from IIIT, Hyderabad. I have been working rigrously in Machine Translation from last one year, and was part of SEECAT project at CBS, Denmark

I haven't worked in open source but will like to contribute to this project.

I am thinking of adding a new language pair to the project, As there is no existing MT system for English-Hindi and Hindi-English. There will be various experiments that can be done to make a state-of-art system



Motivation:
===> For choosing Hindi : It can be a kept as a pivot language for various other Indian languages that have a similar word order.

===> Why ?? No existing Good MT model for this Language Pair


Freely Available Parallel Data released in WMT 14
Large Mono-Lingual Data Available
Parallel Data in 10 Indian Languages including Hindi ( further chaining, can be tried)

Experiments in Mind:
Data Cleaning( it's not really parallel, as formed by extracting data from PDF etc)
Noise Cleaning in Basic Phrase Table ( removing mis-aligned pairs )
Applying Transfer Grammar ( Lab at LTRC, IIIT-Hyderabad has a transfer grammar for re-ordering English in Hindi word-order, this has proved to give better alignments) Re-ranking ( using RNN based LM with features such as features from mica parser)

And other part of the project will be to assist the translator in this CAT system, with the possible translations from translation memory and MT output. He can choose accordingly and post edit it

Do u think, it can be a nice idea ??

Also If there I will be happy to know the progress of the chaining experiment for the translation ??

Regards,
Karan
LTRC, IIIT-Hyderabad


------------------------------------------------------------------------------
Flow-based real-time traffic analytics software. Cisco certified tool.
Monitor traffic, SLAs, QoS, Medianet, WAAS etc. with NetFlow Analyzer
Customize your own dashboards, set traffic alerts and generate reports.
Network behavioral analysis & security monitoring. All-in-one tool.
http://pubads.g.doubleclick.net/gampad/clk?id=126839071&iu=/4140/ostg.clktrk


_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


--
Mikel L. Forcada (http://www.dlsi.ua.es/~mlf/)
Departament de Llenguatges i Sistemes InformĂ tics
Universitat d'Alacant
E-03071 Alacant, Spain
Phone: +34 96 590 9776
Fax: +34 96 590 9326

------------------------------------------------------------------------------
Flow-based real-time traffic analytics software. Cisco certified tool.
Monitor traffic, SLAs, QoS, Medianet, WAAS etc. with NetFlow Analyzer
Customize your own dashboards, set traffic alerts and generate reports.
Network behavioral analysis & security monitoring. All-in-one tool.
http://pubads.g.doubleclick.net/gampad/clk?id=126839071&iu=/4140/ostg.clktrk
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to