Hi Everyone,
Apologies for going slightly off topic but we have an exciting career
opportunity for a Linguistic Developer working on Machine Translation,
Translation Memory and Moses. This role purpose is to develop the use of
linguistic technology within one of the world’s fastest growing
Hi
From what I've seen, moses, even with all the tools that go with it,
requires a sentence-aligned bilingual corpus as its input. What if we
only have an unaligned parallel corpus ? Do you know if there are
tools available to do this sentence-level alignment ? There seems to
be something in
Europarl comes with a sentence aligner:
http://statmt.org/europarl/v5/tools.tgz
You can also use hunalign:
http://mokk.bme.hu/resources/hunalign
(look at the realign feature for lexical matching)
GMA:
http://nlp.cs.nyu.edu/GMA/
Uplug includes all three and also a tool for interactive
Just in case you need a library - I recently packaged the Europarl
sentence splitter and sentence aligner tools into two Perl modules on
CPAN:
http://search.cpan.org/~achimru/Lingua-Sentence-1.00/
http://search.cpan.org/~achimru/Text-GaleChurch-1.00/
Achim
2010/3/22 Jörg Tiedemann