[Moses-support] Error with word alignment, missing sentences in training.out

Daniel.T.Seita Tue, 26 Jun 2012 09:08:59 -0700

Hi,

I am trying to run Moses by following the baseline system for my own data. 
After the step "Training the Translation System," the file training.out is 
created. After tuning and testing, I decided to check the training.out file. A 
segment of it (it's really long) is attached at the bottom of this message. I 
noticed two issues.


First, I keep getting a "[Hill Climb / Model2 / ...]viterbi alignment has zero 
score" warning. What does this warning mean, and how can I rectify this? A 
quick search of the mailing archives revealed that someone solved this issue by 
reducing the size of the maximum sentence length of the corpus, but this 
doesn't explain why the warnings occur.

Second, and the more important of my two questions, my GIZA word alignment 
files do not seem to be of the same length. When I ran the cleaning (the step 
that the baseline system says to do right before the "Language Model 
Training"), I limited sentence lengths to be 100. But after checking my two 
alignment files (foreign -> english and english -> foreign), some sentences are 
missing in the foreign -> english file. I read in the mailing archives that 
someone said cleaning the corpus would solve this issue, but I am sure that I 
ran the cleaning script on the data to limit sentence length to 100 words 
before training (i.e running GIZA++ on it). So what else could be the issue? 
Many mismatch errors exist because one missing sentence throws future 
alignments off by 1.

Thanks for any help you can offer. I've put a segment of the "training.out" 
file at the bottom, with [...] indicating that there were many lines that I did 
not copy & paste because of the vast repetition.

Sincerely,
Daniel Seita


[...]
 0.0114939 j:74 i:8;  NP 6.31447e-06 AP1 0.01262 j:75 i:6;  NP 3.15751e-06 AP1 
0.0110491 j:76 i:12;  NP 3.15751e-06 AP1 0.0106323 j:77 i:11;  NP 3.15751e-06 
AP1 0.535517 j:78 i:12;  NP 3.15751e-06 AP1 0.0104985 j:79 i:6;  NP 0.358595 
AP1 0.0117257 j:80 i:4;  NP 2.10507e-06 AP1 0.0104082 j:81 i:11;  NP 
0.000320498 AP1 0.011022 j:82 i:6;  NP 1.07495e-05 AP1 0.0112897 AP2 0.02392 
j:83 i:11;
WARNING: Hill Climbing yielded a zero score viterbi alignment for the following 
pair:
AL(l:15,m:84)(a: 14 0 14 14 14 14 14 0 0 14 7 14 0 14 6 1 0 5 0 0 5 0 0 15 15 
15 1 15 6 15 1 15 15 5 15 15 7 11 8 3 2 3 5 4 13 2 2 5 1 2 1 2 2 3 1 1 3 3 1 2 
11 3 1 5 13 7 5 6 5 2 7 7 7 2 9 7 13 12 13 7 5 12 7 12 )(fert: 9 9 9 6 1 9 3 9 
1 1 0 2 3 4 9 9 )  c:
Source sentence length : 15 , target : 84
20 169 19 92 5 19 4 20 116 29 75 906 89 33643 3
116639 6 1069 213 247 5372 24011 17 19 2319 3 5328 6 1112 21 28 6405 5 2 4332 5 
24011 17 19 178 7 112 8313 20 1042 106 2 1563 5 37 4189 60 8 2135 29 24011 
111119 6 47 72 26118 5603 6 2 970 296 68 12 36604 249 538 700 305 4327 2680 366 
305 3288 5 2680 3 2053 24011 6509 12 24011 10624 3 12 16215 201 249 538 700 
3288 5 305 4 12
WARNING: Model2 viterbi alignment has zero score.
Here are the different elements that made this alignment probability zero
Source length 15 target length 95
best: fs[1] 1  : es[1] 1 ,  a: 0.897485 t: 0.00395663 score 0.00355102  product 
: 0.00355102 ss 0
best: fs[2] 2  : es[10] 10 ,  a: 0.00624286 t: 0.0126491 score 7.89665e-05  
product : 2.80411e-07 ss 0
best: fs[3] 3  : es[0] 0 ,  a: 0.0574229 t: 0.120728 score 0.00693258  product 
: 1.94397e-09 ss 0
best: fs[4] 4  : es[12] 12 ,  a: 0.00893218 t: 0.987925 score 0.00882433  
product : 1.71542e-11 ss 0
[...]
WARNING: Model2 viterbi alignment has zero score.
Here are the different elements that made this alignment probability zero
Source length 9 target length 78
best: fs[1] 1  : es[3] 3 ,  a: 0.0109157 t: 0.970912 score 0.0105982  product : 
0.0105982 ss 0
best: fs[2] 2  : es[2] 2 ,  a: 0.786017 t: 1e-07 score 7.86017e-08  product : 
8.33033e-10 ss 0
best: fs[3] 3  : es[3] 3 ,  a: 0.678615 t: 1e-07 score 6.78615e-08  product : 
5.65309e-17 ss 0
[...]
Executing: rm -f 
/home/dseita/KauchakWorking/train/giza.norm-simp/norm-simp.A3.final.gz
Executing: gzip 
/home/dseita/KauchakWorking/train/giza.norm-simp/norm-simp.A3.final
Waiting for second GIZA process...
(3) generate word alignment @ Mon Jun 25 18:00:13 EDT 2012
Combining forward and inverted alignment from files:
  /home/dseita/KauchakWorking/train/giza.norm-simp/norm-simp.A3.final.{bz2,gz}
  /home/dseita/KauchakWorking/train/giza.simp-norm/simp-norm.A3.final.{bz2,gz}
Executing: mkdir -p /home/dseita/KauchakWorking/train/model
Executing: /home/dseita/mosesdecoder/scripts/training/giza2bal.pl -d "gzip -cd 
/home/dseita/KauchakWorking/train/giza.simp-norm/simp-norm.A3.final.gz" -i 
"gzip -cd 
/home/dseita/KauchakWorking/train/giza.norm-simp/norm-simp.A3.final.gz" 
|/home/dseita/mosesdecoder/scripts/../bin/symal -alignment="grow" 
-diagonal="yes" -final="yes" -both="no" > 
/home/dseita/KauchakWorking/train/model/aligned.grow-diag-final
symal: computing grow alignment: diagonal (1) final (1)both-uncovered (0)
Sentence mismatch error! Line #86
Sentence mismatch error! Line #87
Sentence mismatch error! Line #88
Sentence mismatch error! Line #89
Sentence mismatch error! Line #90
Sentence mismatch error! Line #91
Sentence mismatch error! Line #92
Sentence mismatch error! Line #93
Sentence mismatch error! Line #94
[...Mismatch errors continue...]
[...]
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

[Moses-support] Error with word alignment, missing sentences in training.out

Reply via email to