Hi,

I started working with some of the well known tools for statistical 
machine translation and I have some questions about these tools.  I was 
wondering if there is a technical mail list for smt tools where these 
questions can be asked, or failing that whether anyone here would know the 
answers. The questions concern at the moment the GIZA++ tool and the ISI 
rewrite tool.

At the moment I have the following two questions:

ISI Rewrite system:

1) zerofert

I started following the instructions on
http://www.isi.edu/natural-language/software/decoder/manual.html
I tried to generate a zerofertility file with the tool rewrite.mkZeroFert.perl
Now this script needs two files: a vocabulary file which I have and
a .n4.final file (according to the comments in that file)

Now this file did not come out of a Giza++ run I did, nor is it mentioned on
the a webpage I found explaining the output of Giza++
(http://www-etud.iro.umontreal.ca/~demorali/traingizainfo.html)
Did I overlook a setting in giza++ or do I need another file?

2) distortion files

For calculation alignments one needs to use some probabilities for word order

Giza++ has:

D3Table          # distortion table for model 3
D4Table          # distortion table for model 4

Now, all the .d3.final files I manage to create with Giza++ have a "60" as 
the third item, which means only distortion information is available for 
sentence of length 60 if I understand it correctly. This can't be correct 
but I have no clue what is going wrong The .D4.final file doesn't have any 
description on the webpage mentioned, but the numbers look weird too. If I 
assume the relative probability is the number mentioned there divided by 
the sum I get this result for the head of cept:

...
-3 0.0170141
-2 0.0167512
-1 0.0132766
0 0.000601109
1 0.590987
2 0.142046
3 0.0633281
...

where I have a very small probability of a word staying in the same place,
which seems to be quite unlikely for my language pair (Dutch-English,
Europarl)

All other files generated by Giza++ seem to make sense and be correct.
_______________________________________________
Mt-list mailing list

Reply via email to