I think the bilingual sentence aligner by Bob Moore of Microsoft does what you want. http://research.microsoft.com/en-us/people/bobmoore/ J
-----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Jörg Tiedemann Sent: Monday, March 22, 2010 11:36 AM To: Raphael Payen Cc: [email protected] Subject: Re: [Moses-support] Tool for segmenting the sentences of a bilingual corpus Europarl comes with a sentence aligner: http://statmt.org/europarl/v5/tools.tgz You can also use hunalign: http://mokk.bme.hu/resources/hunalign (look at the "realign" feature for lexical matching) GMA: http://nlp.cs.nyu.edu/GMA/ Uplug includes all three and also a tool for interactive (semi-automatic) sentence alignment: http://sourceforge.net/projects/uplug/ http://www.let.rug.nl/~tiedeman/Uplug/php/ Jörg Raphael Payen wrote: > Hi > >>From what I've seen, moses, even with all the tools that go with it, > requires a sentence-aligned bilingual corpus as its input. What if we > only have an unaligned parallel corpus ? Do you know if there are > tools available to do this sentence-level alignment ? There seems to > be something in python-nltk, based on Gale & Church, but it is recent > and not yet completely part of the package. Besides, Gale & Church > algorithm uses only sentence lengths, probably there exist more > powerful algorithms, using dictionaries of word alignment information > ? (I mean "static" dictionaries provided beforehand; I guess > theoretically there could be ways to "dynamically" use a word aligner > like giza on an unaligned corpus, compute some word alignments, use > them to compute the sentence alignements, and feed this to itself, but > static dictionaries seem more practical). > > Also, since this step usually requires human supervision, do you know > if there are there open-source / unix GUI tools to assist in editing > the alignements proposed ? (comparable to Trados WinAlign) ? > > Best regards, > _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support Internal Virus Database is out of date. Checked by AVG - www.avg.com Version: 9.0.707 / Virus Database: 270.14.67/2505 - Release Date: 11/15/09 15:50:00 _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
