Europarl comes with a sentence aligner: http://statmt.org/europarl/v5/tools.tgz
You can also use hunalign: http://mokk.bme.hu/resources/hunalign (look at the "realign" feature for lexical matching) GMA: http://nlp.cs.nyu.edu/GMA/ Uplug includes all three and also a tool for interactive (semi-automatic) sentence alignment: http://sourceforge.net/projects/uplug/ http://www.let.rug.nl/~tiedeman/Uplug/php/ Jörg Raphael Payen wrote: > Hi > >>From what I've seen, moses, even with all the tools that go with it, > requires a sentence-aligned bilingual corpus as its input. What if we > only have an unaligned parallel corpus ? Do you know if there are > tools available to do this sentence-level alignment ? There seems to > be something in python-nltk, based on Gale & Church, but it is recent > and not yet completely part of the package. Besides, Gale & Church > algorithm uses only sentence lengths, probably there exist more > powerful algorithms, using dictionaries of word alignment information > ? (I mean "static" dictionaries provided beforehand; I guess > theoretically there could be ways to "dynamically" use a word aligner > like giza on an unaligned corpus, compute some word alignments, use > them to compute the sentence alignements, and feed this to itself, but > static dictionaries seem more practical). > > Also, since this step usually requires human supervision, do you know > if there are there open-source / unix GUI tools to assist in editing > the alignements proposed ? (comparable to Trados WinAlign) ? > > Best regards, > _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
