Hi >From what I've seen, moses, even with all the tools that go with it, requires a sentence-aligned bilingual corpus as its input. What if we only have an unaligned parallel corpus ? Do you know if there are tools available to do this sentence-level alignment ? There seems to be something in python-nltk, based on Gale & Church, but it is recent and not yet completely part of the package. Besides, Gale & Church algorithm uses only sentence lengths, probably there exist more powerful algorithms, using dictionaries of word alignment information ? (I mean "static" dictionaries provided beforehand; I guess theoretically there could be ways to "dynamically" use a word aligner like giza on an unaligned corpus, compute some word alignments, use them to compute the sentence alignements, and feed this to itself, but static dictionaries seem more practical).
Also, since this step usually requires human supervision, do you know if there are there open-source / unix GUI tools to assist in editing the alignements proposed ? (comparable to Trados WinAlign) ? Best regards, -- Raphael Payen _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
