Hi, I started working with some of the well known tools for statistical machine translation and I have some questions about these tools. I was wondering if there is a technical mail list for smt tools where these questions can be asked, or failing that whether anyone here would know the answers. The questions concern at the moment the GIZA++ tool and the ISI rewrite tool.
At the moment I have the following two questions: ISI Rewrite system: 1) zerofert I started following the instructions on http://www.isi.edu/natural-language/software/decoder/manual.html I tried to generate a zerofertility file with the tool rewrite.mkZeroFert.perl Now this script needs two files: a vocabulary file which I have and a .n4.final file (according to the comments in that file) Now this file did not come out of a Giza++ run I did, nor is it mentioned on the a webpage I found explaining the output of Giza++ (http://www-etud.iro.umontreal.ca/~demorali/traingizainfo.html) Did I overlook a setting in giza++ or do I need another file? 2) distortion files For calculation alignments one needs to use some probabilities for word order Giza++ has: D3Table # distortion table for model 3 D4Table # distortion table for model 4 Now, all the .d3.final files I manage to create with Giza++ have a "60" as the third item, which means only distortion information is available for sentence of length 60 if I understand it correctly. This can't be correct but I have no clue what is going wrong The .D4.final file doesn't have any description on the webpage mentioned, but the numbers look weird too. If I assume the relative probability is the number mentioned there divided by the sum I get this result for the head of cept: ... -3 0.0170141 -2 0.0167512 -1 0.0132766 0 0.000601109 1 0.590987 2 0.142046 3 0.0633281 ... where I have a very small probability of a word staying in the same place, which seems to be quite unlikely for my language pair (Dutch-English, Europarl) All other files generated by Giza++ seem to make sense and be correct. _______________________________________________ Mt-list mailing list
