Hi,
I am trying to run Moses by following the baseline system for my own data.
After the step "Training the Translation System," the file training.out is
created. After tuning and testing, I decided to check the training.out file. A
segment of it (it's really long) is attached at the bottom of this message. I
noticed two issues.
First, I keep getting a "[Hill Climb / Model2 / ...]viterbi alignment has zero
score" warning. What does this warning mean, and how can I rectify this? A
quick search of the mailing archives revealed that someone solved this issue by
reducing the size of the maximum sentence length of the corpus, but this
doesn't explain why the warnings occur.
Second, and the more important of my two questions, my GIZA word alignment
files do not seem to be of the same length. When I ran the cleaning (the step
that the baseline system says to do right before the "Language Model
Training"), I limited sentence lengths to be 100. But after checking my two
alignment files (foreign -> english and english -> foreign), some sentences are
missing in the foreign -> english file. I read in the mailing archives that
someone said cleaning the corpus would solve this issue, but I am sure that I
ran the cleaning script on the data to limit sentence length to 100 words
before training (i.e running GIZA++ on it). So what else could be the issue?
Many mismatch errors exist because one missing sentence throws future
alignments off by 1.
Thanks for any help you can offer. I've put a segment of the "training.out"
file at the bottom, with [...] indicating that there were many lines that I did
not copy & paste because of the vast repetition.
Sincerely,
Daniel Seita
[...]
0.0114939 j:74 i:8; NP 6.31447e-06 AP1 0.01262 j:75 i:6; NP 3.15751e-06 AP1
0.0110491 j:76 i:12; NP 3.15751e-06 AP1 0.0106323 j:77 i:11; NP 3.15751e-06
AP1 0.535517 j:78 i:12; NP 3.15751e-06 AP1 0.0104985 j:79 i:6; NP 0.358595
AP1 0.0117257 j:80 i:4; NP 2.10507e-06 AP1 0.0104082 j:81 i:11; NP
0.000320498 AP1 0.011022 j:82 i:6; NP 1.07495e-05 AP1 0.0112897 AP2 0.02392
j:83 i:11;
WARNING: Hill Climbing yielded a zero score viterbi alignment for the following
pair:
AL(l:15,m:84)(a: 14 0 14 14 14 14 14 0 0 14 7 14 0 14 6 1 0 5 0 0 5 0 0 15 15
15 1 15 6 15 1 15 15 5 15 15 7 11 8 3 2 3 5 4 13 2 2 5 1 2 1 2 2 3 1 1 3 3 1 2
11 3 1 5 13 7 5 6 5 2 7 7 7 2 9 7 13 12 13 7 5 12 7 12 )(fert: 9 9 9 6 1 9 3 9
1 1 0 2 3 4 9 9 ) c:
Source sentence length : 15 , target : 84
20 169 19 92 5 19 4 20 116 29 75 906 89 33643 3
116639 6 1069 213 247 5372 24011 17 19 2319 3 5328 6 1112 21 28 6405 5 2 4332 5
24011 17 19 178 7 112 8313 20 1042 106 2 1563 5 37 4189 60 8 2135 29 24011
111119 6 47 72 26118 5603 6 2 970 296 68 12 36604 249 538 700 305 4327 2680 366
305 3288 5 2680 3 2053 24011 6509 12 24011 10624 3 12 16215 201 249 538 700
3288 5 305 4 12
WARNING: Model2 viterbi alignment has zero score.
Here are the different elements that made this alignment probability zero
Source length 15 target length 95
best: fs[1] 1 : es[1] 1 , a: 0.897485 t: 0.00395663 score 0.00355102 product
: 0.00355102 ss 0
best: fs[2] 2 : es[10] 10 , a: 0.00624286 t: 0.0126491 score 7.89665e-05
product : 2.80411e-07 ss 0
best: fs[3] 3 : es[0] 0 , a: 0.0574229 t: 0.120728 score 0.00693258 product
: 1.94397e-09 ss 0
best: fs[4] 4 : es[12] 12 , a: 0.00893218 t: 0.987925 score 0.00882433
product : 1.71542e-11 ss 0
[...]
WARNING: Model2 viterbi alignment has zero score.
Here are the different elements that made this alignment probability zero
Source length 9 target length 78
best: fs[1] 1 : es[3] 3 , a: 0.0109157 t: 0.970912 score 0.0105982 product :
0.0105982 ss 0
best: fs[2] 2 : es[2] 2 , a: 0.786017 t: 1e-07 score 7.86017e-08 product :
8.33033e-10 ss 0
best: fs[3] 3 : es[3] 3 , a: 0.678615 t: 1e-07 score 6.78615e-08 product :
5.65309e-17 ss 0
[...]
Executing: rm -f
/home/dseita/KauchakWorking/train/giza.norm-simp/norm-simp.A3.final.gz
Executing: gzip
/home/dseita/KauchakWorking/train/giza.norm-simp/norm-simp.A3.final
Waiting for second GIZA process...
(3) generate word alignment @ Mon Jun 25 18:00:13 EDT 2012
Combining forward and inverted alignment from files:
/home/dseita/KauchakWorking/train/giza.norm-simp/norm-simp.A3.final.{bz2,gz}
/home/dseita/KauchakWorking/train/giza.simp-norm/simp-norm.A3.final.{bz2,gz}
Executing: mkdir -p /home/dseita/KauchakWorking/train/model
Executing: /home/dseita/mosesdecoder/scripts/training/giza2bal.pl -d "gzip -cd
/home/dseita/KauchakWorking/train/giza.simp-norm/simp-norm.A3.final.gz" -i
"gzip -cd
/home/dseita/KauchakWorking/train/giza.norm-simp/norm-simp.A3.final.gz"
|/home/dseita/mosesdecoder/scripts/../bin/symal -alignment="grow"
-diagonal="yes" -final="yes" -both="no" >
/home/dseita/KauchakWorking/train/model/aligned.grow-diag-final
symal: computing grow alignment: diagonal (1) final (1)both-uncovered (0)
Sentence mismatch error! Line #86
Sentence mismatch error! Line #87
Sentence mismatch error! Line #88
Sentence mismatch error! Line #89
Sentence mismatch error! Line #90
Sentence mismatch error! Line #91
Sentence mismatch error! Line #92
Sentence mismatch error! Line #93
Sentence mismatch error! Line #94
[...Mismatch errors continue...]
[...]
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support