Dear Simon,

I would recommend that you use Philipp Koehn's Pharaoh decoder rather than the ISI ReWrite decoder. Pharaoh uses phrase-based models of statistical machine translation rather than the older IBM-style word- based models.

You can download Pharaoh from
        http://www.isi.edu/licensed-sw/pharaoh/

and additional training scripts from
        http://www.iccs.informatics.ed.ac.uk/~pkoehn/training.tgz

There is a comprehensive manual that accompanies the program, as well as a conference paper that you can cite in in publications that use the program:

@inproceedings{ Koehn2004,
  author =      {Philipp Koehn},
title = {Pharaoh: A Beam Search Decoder for Phrase-Based Statistical Machine Translation Models},
  booktitle =   {Proceedings of AMTA},
  year =        {2004},
url = {http://www.iccs.informatics.ed.ac.uk/~pkoehn/publications/ pharaoh-amta2004.pdf}
}

Also, if you're interested in tinkering with the innards of a phrase- based decoder, I have started an open source project to create a phrase-based decoder which is in a fairly mature state. Please e- mail me if you are interested.

Yours,
Chris Callison-Burch


On Nov 2, 2005, at 2:19 AM, Simon Zwarts wrote:

Hi,

I started working with some of the well known tools for statistical
machine translation and I have some questions about these tools. I was
wondering if there is a technical mail list for smt tools where these
questions can be asked, or failing that whether anyone here would know the answers. The questions concern at the moment the GIZA++ tool and the ISI
rewrite tool.

At the moment I have the following two questions:

ISI Rewrite system:

1) zerofert

I started following the instructions on
http://www.isi.edu/natural-language/software/decoder/manual.html
I tried to generate a zerofertility file with the tool rewrite.mkZeroFert.perl
Now this script needs two files: a vocabulary file which I have and
a .n4.final file (according to the comments in that file)

Now this file did not come out of a Giza++ run I did, nor is it mentioned on
the a webpage I found explaining the output of Giza++
(http://www-etud.iro.umontreal.ca/~demorali/traingizainfo.html)
Did I overlook a setting in giza++ or do I need another file?

2) distortion files

For calculation alignments one needs to use some probabilities for word order

Giza++ has:

D3Table          # distortion table for model 3
D4Table          # distortion table for model 4

Now, all the .d3.final files I manage to create with Giza++ have a "60" as the third item, which means only distortion information is available for sentence of length 60 if I understand it correctly. This can't be correct but I have no clue what is going wrong The .D4.final file doesn't have any description on the webpage mentioned, but the numbers look weird too. If I assume the relative probability is the number mentioned there divided by
the sum I get this result for the head of cept:

...
-3 0.0170141
-2 0.0167512
-1 0.0132766
0 0.000601109
1 0.590987
2 0.142046
3 0.0633281
...

where I have a very small probability of a word staying in the same place,
which seems to be quite unlikely for my language pair (Dutch-English,
Europarl)

All other files generated by Giza++ seem to make sense and be correct.
_______________________________________________
Mt-list mailing list

_______________________________________________
Mt-list mailing list

Reply via email to