Re: [Moses-support] Error with word alignment, missing sentences in training.out

Hieu Hoang Thu, 28 Jun 2012 20:47:53 -0700

hi daniel

if you're just trying to create a vanilla moses system, I would 
recommend you use the experiment pipeline to run moses.


This takes care of running each step in the MT pipeline for you so that 
you don't miss anything and you don't run anything incorrectly.
You can find some info about it here
    http://www.statmt.org/moses/?n=FactoredTraining.EMS
You can see an example of a config file for it in the directory
    scripts/ems/example

On 26/06/2012 12:07, [email protected] wrote:
> Hi,
>
> I am trying to run Moses by following the baseline system for my own data. 
> After the step "Training the Translation System," the file training.out is 
> created. After tuning and testing, I decided to check the training.out file. 
> A segment of it (it's really long) is attached at the bottom of this message. 
> I noticed two issues.
>
> First, I keep getting a "[Hill Climb / Model2 / ...]viterbi alignment has 
> zero score" warning. What does this warning mean, and how can I rectify this? 
> A quick search of the mailing archives revealed that someone solved this 
> issue by reducing the size of the maximum sentence length of the corpus, but 
> this doesn't explain why the warnings occur.
>
> Second, and the more important of my two questions, my GIZA word alignment 
> files do not seem to be of the same length. When I ran the cleaning (the step 
> that the baseline system says to do right before the "Language Model 
> Training"), I limited sentence lengths to be 100. But after checking my two 
> alignment files (foreign ->  english and english ->  foreign), some sentences 
> are missing in the foreign ->  english file. I read in the mailing archives 
> that someone said cleaning the corpus would solve this issue, but I am sure 
> that I ran the cleaning script on the data to limit sentence length to 100 
> words before training (i.e running GIZA++ on it). So what else could be the 
> issue? Many mismatch errors exist because one missing sentence throws future 
> alignments off by 1.
>
> Thanks for any help you can offer. I've put a segment of the "training.out" 
> file at the bottom, with [...] indicating that there were many lines that I 
> did not copy&  paste because of the vast repetition.
>
> Sincerely,
> Daniel Seita
>
>
> [...]
>   0.0114939 j:74 i:8;  NP 6.31447e-06 AP1 0.01262 j:75 i:6;  NP 3.15751e-06 
> AP1 0.0110491 j:76 i:12;  NP 3.15751e-06 AP1 0.0106323 j:77 i:11;  NP 
> 3.15751e-06 AP1 0.535517 j:78 i:12;  NP 3.15751e-06 AP1 0.0104985 j:79 i:6;  
> NP 0.358595 AP1 0.0117257 j:80 i:4;  NP 2.10507e-06 AP1 0.0104082 j:81 i:11;  
> NP 0.000320498 AP1 0.011022 j:82 i:6;  NP 1.07495e-05 AP1 0.0112897 AP2 
> 0.02392 j:83 i:11;
> WARNING: Hill Climbing yielded a zero score viterbi alignment for the 
> following pair:
> AL(l:15,m:84)(a: 14 0 14 14 14 14 14 0 0 14 7 14 0 14 6 1 0 5 0 0 5 0 0 15 15 
> 15 1 15 6 15 1 15 15 5 15 15 7 11 8 3 2 3 5 4 13 2 2 5 1 2 1 2 2 3 1 1 3 3 1 
> 2 11 3 1 5 13 7 5 6 5 2 7 7 7 2 9 7 13 12 13 7 5 12 7 12 )(fert: 9 9 9 6 1 9 
> 3 9 1 1 0 2 3 4 9 9 )  c:
> Source sentence length : 15 , target : 84
> 20 169 19 92 5 19 4 20 116 29 75 906 89 33643 3
> 116639 6 1069 213 247 5372 24011 17 19 2319 3 5328 6 1112 21 28 6405 5 2 4332 
> 5 24011 17 19 178 7 112 8313 20 1042 106 2 1563 5 37 4189 60 8 2135 29 24011 
> 111119 6 47 72 26118 5603 6 2 970 296 68 12 36604 249 538 700 305 4327 2680 
> 366 305 3288 5 2680 3 2053 24011 6509 12 24011 10624 3 12 16215 201 249 538 
> 700 3288 5 305 4 12
> WARNING: Model2 viterbi alignment has zero score.
> Here are the different elements that made this alignment probability zero
> Source length 15 target length 95
> best: fs[1] 1  : es[1] 1 ,  a: 0.897485 t: 0.00395663 score 0.00355102  
> product : 0.00355102 ss 0
> best: fs[2] 2  : es[10] 10 ,  a: 0.00624286 t: 0.0126491 score 7.89665e-05  
> product : 2.80411e-07 ss 0
> best: fs[3] 3  : es[0] 0 ,  a: 0.0574229 t: 0.120728 score 0.00693258  
> product : 1.94397e-09 ss 0
> best: fs[4] 4  : es[12] 12 ,  a: 0.00893218 t: 0.987925 score 0.00882433  
> product : 1.71542e-11 ss 0
> [...]
> WARNING: Model2 viterbi alignment has zero score.
> Here are the different elements that made this alignment probability zero
> Source length 9 target length 78
> best: fs[1] 1  : es[3] 3 ,  a: 0.0109157 t: 0.970912 score 0.0105982  product 
> : 0.0105982 ss 0
> best: fs[2] 2  : es[2] 2 ,  a: 0.786017 t: 1e-07 score 7.86017e-08  product : 
> 8.33033e-10 ss 0
> best: fs[3] 3  : es[3] 3 ,  a: 0.678615 t: 1e-07 score 6.78615e-08  product : 
> 5.65309e-17 ss 0
> [...]
> Executing: rm -f 
> /home/dseita/KauchakWorking/train/giza.norm-simp/norm-simp.A3.final.gz
> Executing: gzip 
> /home/dseita/KauchakWorking/train/giza.norm-simp/norm-simp.A3.final
> Waiting for second GIZA process...
> (3) generate word alignment @ Mon Jun 25 18:00:13 EDT 2012
> Combining forward and inverted alignment from files:
>    
> /home/dseita/KauchakWorking/train/giza.norm-simp/norm-simp.A3.final.{bz2,gz}
>    
> /home/dseita/KauchakWorking/train/giza.simp-norm/simp-norm.A3.final.{bz2,gz}
> Executing: mkdir -p /home/dseita/KauchakWorking/train/model
> Executing: /home/dseita/mosesdecoder/scripts/training/giza2bal.pl -d "gzip 
> -cd /home/dseita/KauchakWorking/train/giza.simp-norm/simp-norm.A3.final.gz" 
> -i "gzip -cd 
> /home/dseita/KauchakWorking/train/giza.norm-simp/norm-simp.A3.final.gz" 
> |/home/dseita/mosesdecoder/scripts/../bin/symal -alignment="grow" 
> -diagonal="yes" -final="yes" -both="no">  
> /home/dseita/KauchakWorking/train/model/aligned.grow-diag-final
> symal: computing grow alignment: diagonal (1) final (1)both-uncovered (0)
> Sentence mismatch error! Line #86
> Sentence mismatch error! Line #87
> Sentence mismatch error! Line #88
> Sentence mismatch error! Line #89
> Sentence mismatch error! Line #90
> Sentence mismatch error! Line #91
> Sentence mismatch error! Line #92
> Sentence mismatch error! Line #93
> Sentence mismatch error! Line #94
> [...Mismatch errors continue...]
> [...]
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Error with word alignment, missing sentences in training.out

Reply via email to