Re: [Moses-support] Tool for segmenting the sentences of a bilingual corpus

Jörg Tiedemann Mon, 22 Mar 2010 08:37:54 -0700

Europarl comes with a sentence aligner:
http://statmt.org/europarl/v5/tools.tgz


You can also use hunalign:
http://mokk.bme.hu/resources/hunalign
(look at the "realign" feature for lexical matching)
GMA:
http://nlp.cs.nyu.edu/GMA/

Uplug includes all three and also a tool for interactive 
(semi-automatic) sentence alignment:
http://sourceforge.net/projects/uplug/
http://www.let.rug.nl/~tiedeman/Uplug/php/


Jörg


Raphael Payen wrote:
> Hi
> 
>>From what I've seen, moses, even with all the tools that go with it,
> requires a sentence-aligned bilingual corpus as its input. What if we
> only have an unaligned parallel corpus ? Do you know if there are
> tools available to do this sentence-level alignment ? There seems to
> be something in python-nltk, based on Gale & Church, but it is recent
> and not yet completely part of the package. Besides, Gale & Church
> algorithm uses only sentence lengths, probably there exist more
> powerful algorithms, using dictionaries of word alignment information
> ? (I mean "static" dictionaries provided beforehand; I guess
> theoretically there could be ways to "dynamically" use a word aligner
> like giza on an unaligned corpus, compute some word alignments, use
> them to compute the sentence alignements, and feed this to itself, but
> static dictionaries seem more practical).
> 
> Also, since this step usually requires human supervision, do you know
> if there are there open-source / unix GUI tools to assist in editing
> the alignements proposed ? (comparable to Trados WinAlign) ?
> 
> Best regards,
> 
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Tool for segmenting the sentences of a bilingual corpus

Reply via email to