Chris Callison-Burch
Wed, 02 Nov 2005 01:59:31 -0800
Dear Simon,I would recommend that you use Philipp Koehn's Pharaoh decoder rather than the ISI ReWrite decoder. Pharaoh uses phrase-based models of statistical machine translation rather than the older IBM-style word- based models.
You can download Pharaoh from
http://www.isi.edu/licensed-sw/pharaoh/
and additional training scripts from
http://www.iccs.informatics.ed.ac.uk/~pkoehn/training.tgz
There is a comprehensive manual that accompanies the program, as well
as a conference paper that you can cite in in publications that use
the program:
@inproceedings{ Koehn2004,
author = {Philipp Koehn},
title = {Pharaoh: A Beam Search Decoder for Phrase-Based
Statistical Machine Translation Models},
booktitle = {Proceedings of AMTA},
year = {2004},
url = {http://www.iccs.informatics.ed.ac.uk/~pkoehn/publications/
pharaoh-amta2004.pdf}
}Also, if you're interested in tinkering with the innards of a phrase- based decoder, I have started an open source project to create a phrase-based decoder which is in a fairly mature state. Please e- mail me if you are interested.
Yours, Chris Callison-Burch On Nov 2, 2005, at 2:19 AM, Simon Zwarts wrote:
Hi, I started working with some of the well known tools for statisticalmachine translation and I have some questions about these tools. I waswondering if there is a technical mail list for smt tools where thesequestions can be asked, or failing that whether anyone here would know the answers. The questions concern at the moment the GIZA++ tool and the ISIrewrite tool. At the moment I have the following two questions: ISI Rewrite system: 1) zerofert I started following the instructions on http://www.isi.edu/natural-language/software/decoder/manual.htmlI tried to generate a zerofertility file with the tool rewrite.mkZeroFert.perlNow this script needs two files: a vocabulary file which I have and a .n4.final file (according to the comments in that file)Now this file did not come out of a Giza++ run I did, nor is it mentioned onthe a webpage I found explaining the output of Giza++ (http://www-etud.iro.umontreal.ca/~demorali/traingizainfo.html) Did I overlook a setting in giza++ or do I need another file? 2) distortion filesFor calculation alignments one needs to use some probabilities for word orderGiza++ has: D3Table # distortion table for model 3 D4Table # distortion table for model 4Now, all the .d3.final files I manage to create with Giza++ have a "60" as the third item, which means only distortion information is available for sentence of length 60 if I understand it correctly. This can't be correct but I have no clue what is going wrong The .D4.final file doesn't have any description on the webpage mentioned, but the numbers look weird too. If I assume the relative probability is the number mentioned there divided bythe sum I get this result for the head of cept: ... -3 0.0170141 -2 0.0167512 -1 0.0132766 0 0.000601109 1 0.590987 2 0.142046 3 0.0633281 ...where I have a very small probability of a word staying in the same place,which seems to be quite unlikely for my language pair (Dutch-English, Europarl) All other files generated by Giza++ seem to make sense and be correct. _______________________________________________ Mt-list mailing list
_______________________________________________ Mt-list mailing list