Dear Simon,
I would recommend that you use Philipp Koehn's Pharaoh decoder rather
than the ISI ReWrite decoder. Pharaoh uses phrase-based models of
statistical machine translation rather than the older IBM-style word-
based models.
You can download Pharaoh from
http://www.isi.edu/licensed-sw/pharaoh/
and additional training scripts from
http://www.iccs.informatics.ed.ac.uk/~pkoehn/training.tgz
There is a comprehensive manual that accompanies the program, as well
as a conference paper that you can cite in in publications that use
the program:
@inproceedings{ Koehn2004,
author = {Philipp Koehn},
title = {Pharaoh: A Beam Search Decoder for Phrase-Based
Statistical Machine Translation Models},
booktitle = {Proceedings of AMTA},
year = {2004},
url = {http://www.iccs.informatics.ed.ac.uk/~pkoehn/publications/
pharaoh-amta2004.pdf}
}
Also, if you're interested in tinkering with the innards of a phrase-
based decoder, I have started an open source project to create a
phrase-based decoder which is in a fairly mature state. Please e-
mail me if you are interested.
Yours,
Chris Callison-Burch
On Nov 2, 2005, at 2:19 AM, Simon Zwarts wrote:
Hi,
I started working with some of the well known tools for statistical
machine translation and I have some questions about these tools. I
was
wondering if there is a technical mail list for smt tools where these
questions can be asked, or failing that whether anyone here would
know the
answers. The questions concern at the moment the GIZA++ tool and
the ISI
rewrite tool.
At the moment I have the following two questions:
ISI Rewrite system:
1) zerofert
I started following the instructions on
http://www.isi.edu/natural-language/software/decoder/manual.html
I tried to generate a zerofertility file with the tool
rewrite.mkZeroFert.perl
Now this script needs two files: a vocabulary file which I have and
a .n4.final file (according to the comments in that file)
Now this file did not come out of a Giza++ run I did, nor is it
mentioned on
the a webpage I found explaining the output of Giza++
(http://www-etud.iro.umontreal.ca/~demorali/traingizainfo.html)
Did I overlook a setting in giza++ or do I need another file?
2) distortion files
For calculation alignments one needs to use some probabilities for
word order
Giza++ has:
D3Table # distortion table for model 3
D4Table # distortion table for model 4
Now, all the .d3.final files I manage to create with Giza++ have a
"60" as
the third item, which means only distortion information is
available for
sentence of length 60 if I understand it correctly. This can't be
correct
but I have no clue what is going wrong The .D4.final file doesn't
have any
description on the webpage mentioned, but the numbers look weird
too. If I
assume the relative probability is the number mentioned there
divided by
the sum I get this result for the head of cept:
...
-3 0.0170141
-2 0.0167512
-1 0.0132766
0 0.000601109
1 0.590987
2 0.142046
3 0.0633281
...
where I have a very small probability of a word staying in the same
place,
which seems to be quite unlikely for my language pair (Dutch-English,
Europarl)
All other files generated by Giza++ seem to make sense and be correct.
_______________________________________________
Mt-list mailing list
_______________________________________________
Mt-list mailing list