Karan,
Parallel data such as the resources you mention are of little use for
Apertium, a rule-based system.
Is the transfer grammar mentioned free/open-source? Could it be turned
into Apertium rules?
Come back when you have a clearer idea!
Cheers
Mikel
Al 02/28/2014 05:42 PM, En/na karan singla ha escrit:
Hello all,
I am Karan SIngla, pursuing BTech in CSE and MS in Computational
Linguistics from IIIT, Hyderabad. I have been working rigrously in
Machine Translation from last one year, and was part of SEECAT project
at CBS, Denmark
I haven't worked in open source but will like to contribute to this
project.
I am thinking of adding a new language pair to the project, As there
is no existing MT system for English-Hindi and Hindi-English. There
will be various experiments that can be done to make a state-of-art system
Motivation:
===> For choosing Hindi : It can be a kept as a pivot language for
various other Indian languages that have a similar word order.
===> Why ?? No existing Good MT model for this Language Pair
Freely Available Parallel Data released in WMT 14
Large Mono-Lingual Data Available
Parallel Data in 10 Indian Languages including Hindi ( further
chaining, can be tried)
Experiments in Mind:
Data Cleaning( it's not really parallel, as formed by extracting data
from PDF etc)
Noise Cleaning in Basic Phrase Table ( removing mis-aligned pairs )
Applying Transfer Grammar ( Lab at LTRC, IIIT-Hyderabad has a transfer
grammar for re-ordering English in Hindi word-order, this has proved
to give better alignments)
Re-ranking ( using RNN based LM with features such as features from
mica parser)
And other part of the project will be to assist the translator in this
CAT system, with the possible translations from translation memory and
MT output. He can choose accordingly and post edit it
Do u think, it can be a nice idea ??
Also If there I will be happy to know the progress of the chaining
experiment for the translation ??
Regards,
Karan
LTRC, IIIT-Hyderabad
------------------------------------------------------------------------------
Flow-based real-time traffic analytics software. Cisco certified tool.
Monitor traffic, SLAs, QoS, Medianet, WAAS etc. with NetFlow Analyzer
Customize your own dashboards, set traffic alerts and generate reports.
Network behavioral analysis & security monitoring. All-in-one tool.
http://pubads.g.doubleclick.net/gampad/clk?id=126839071&iu=/4140/ostg.clktrk
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff
--
Mikel L. Forcada (http://www.dlsi.ua.es/~mlf/)
Departament de Llenguatges i Sistemes InformĂ tics
Universitat d'Alacant
E-03071 Alacant, Spain
Phone: +34 96 590 9776
Fax: +34 96 590 9326
------------------------------------------------------------------------------
Flow-based real-time traffic analytics software. Cisco certified tool.
Monitor traffic, SLAs, QoS, Medianet, WAAS etc. with NetFlow Analyzer
Customize your own dashboards, set traffic alerts and generate reports.
Network behavioral analysis & security monitoring. All-in-one tool.
http://pubads.g.doubleclick.net/gampad/clk?id=126839071&iu=/4140/ostg.clktrk
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff