Re: [Mt-list] SMT Tools

Chris Callison-Burch Wed, 02 Nov 2005 01:59:31 -0800

Dear Simon,

I would recommend that you use Philipp Koehn's Pharaoh decoder ratherthan the ISI ReWrite decoder. Pharaoh uses phrase-based models ofstatistical machine translation rather than the older IBM-style word-based models.


You can download Pharaoh from
        http://www.isi.edu/licensed-sw/pharaoh/

and additional training scripts from
        http://www.iccs.informatics.ed.ac.uk/~pkoehn/training.tgz

There is a comprehensive manual that accompanies the program, as wellas a conference paper that you can cite in in publications that usethe program:


@inproceedings{ Koehn2004,
  author =      {Philipp Koehn},

title = {Pharaoh: A Beam Search Decoder for Phrase-BasedStatistical Machine Translation Models},

  booktitle =   {Proceedings of AMTA},
  year =        {2004},

url = {http://www.iccs.informatics.ed.ac.uk/~pkoehn/publications/pharaoh-amta2004.pdf}

Also, if you're interested in tinkering with the innards of a phrase-based decoder, I have started an open source project to create aphrase-based decoder which is in a fairly mature state. Please e-mail me if you are interested.


Yours,
Chris Callison-Burch


On Nov 2, 2005, at 2:19 AM, Simon Zwarts wrote:

Hi,

I started working with some of the well known tools for statistical
machine translation and I have some questions about these tools. Iwas
wondering if there is a technical mail list for smt tools where these
questions can be asked, or failing that whether anyone here wouldknow theanswers. The questions concern at the moment the GIZA++ tool andthe ISI
rewrite tool.

At the moment I have the following two questions:

ISI Rewrite system:

1) zerofert

I started following the instructions on
http://www.isi.edu/natural-language/software/decoder/manual.html
I tried to generate a zerofertility file with the toolrewrite.mkZeroFert.perl
Now this script needs two files: a vocabulary file which I have and
a .n4.final file (according to the comments in that file)
Now this file did not come out of a Giza++ run I did, nor is itmentioned on
the a webpage I found explaining the output of Giza++
(http://www-etud.iro.umontreal.ca/~demorali/traingizainfo.html)
Did I overlook a setting in giza++ or do I need another file?

2) distortion files
For calculation alignments one needs to use some probabilities forword order
Giza++ has:

D3Table          # distortion table for model 3
D4Table          # distortion table for model 4
Now, all the .d3.final files I manage to create with Giza++ have a"60" asthe third item, which means only distortion information isavailable forsentence of length 60 if I understand it correctly. This can't becorrectbut I have no clue what is going wrong The .D4.final file doesn'thave anydescription on the webpage mentioned, but the numbers look weirdtoo. If Iassume the relative probability is the number mentioned theredivided by
the sum I get this result for the head of cept:

...
-3 0.0170141
-2 0.0167512
-1 0.0132766
0 0.000601109
1 0.590987
2 0.142046
3 0.0633281
...
where I have a very small probability of a word staying in the sameplace,
which seems to be quite unlikely for my language pair (Dutch-English,
Europarl)

All other files generated by Giza++ seem to make sense and be correct.
_______________________________________________
Mt-list mailing list


_______________________________________________
Mt-list mailing list

Re: [Mt-list] SMT Tools

Reply via email to