Model 3/4 uses HMM/Model2 to bootstrap the alignment. First Viterbi
alignment is computed using Model2 or HMM model, and then a hillclimbing
algorithm is used to find optimal alignment using model 3/4. So even the
initial alignment has some problem you _may_ still get good alignment
result. So I usually not too worried about the zero prob viterbi alignment
problem.

For the reason of the problem, usually it is because of underflow.
Unfortunately giza does not use log probability, but true probability. So
it can underflow if the sentence is too long. That's why the sentence
length limit is set to 100.

In our in-house experiment, changing float and double to long double can
reduce the chance of underflow, and can support much longer sentences. We
tried 500 before. However, the cost you pay is speed.

--Q



On Tue, Jul 3, 2012 at 5:46 AM, Patricia Helmich <
[email protected]> wrote:

>  Hi,
>
> I am trying to train phrase models for several language pairs. Before
> training the phrase models, I cleaned the corpora with the moses clean
> script, so sentences with a length >60 were filtered out. This worked for
> several corpora. For a few corpora, I got "WARNING: Model2 viterbi
> alignment has zero score."
> I found that another person solved the problem by reducing the length of
> the sentences, so I reduced the length of the sentences to 50 for these
> corpora. This worked for the problematic corpora except for one corpora
> pair. For this corpora pair, I had to reduce the length of the sentences to
> 30, so that it finally worked. By reducing the length to 30, I'm loosing a
> high number of sentences of my corpora. That's why I was wondering which
> is the reason for this warning and why for some language pairs it works
> with longer sentences and for others it doesn't.
> I also checked the ratio of 9:1.
> Can you imagine any reason for this warning? And, since it is marked as a
> warning, not as an error, is it necessary to remove it?
> It would be very kind if you could give me some information about this
> problem.
>
> Thank you,
> Patricia
>
>
> Extract from the logfile:
>
>    406  THTo3: Iteration 1
>    407  Reading more sentence pairs into memory ...
>    408  WARNING: Model2 viterbi alignment has zero score.
>    409  Here are the different elements that made this alignment
> probability zero
>    410  Source length 4 target length 35
>    411  best: fs[1] 1  : es[3] 3 ,  a: 0.13803 t: 0.870283 score 0.120125
>  product : 0.120125 ss 0
>    412  best: fs[2] 2  : es[1] 1 ,  a: 0.350718 t: 0.221544 score
> 0.0776995  product : 0.00933363 ss 0
>    413  best: fs[3] 3  : es[1] 1 ,  a: 0.150805 t: 0.324392 score
> 0.0489198  product : 0.000456599 ss 0
>    414  best: fs[4] 4  : es[1] 1 ,  a: 0.0606276 t: 0.324392 score
> 0.0196671  product : 8.97998e-06 ss 0
>    415  best: fs[5] 5  : es[1] 1 ,  a: 0.037479 t: 0.324392 score
> 0.0121579  product : 1.09178e-07 ss 0
>    416  best: fs[6] 6  : es[1] 1 ,  a: 0.021535 t: 0.324392 score
> 0.0069858  product : 7.62692e-10 ss 0
>    417  best: fs[7] 7  : es[1] 1 ,  a: 0.041835 t: 0.324392 score
> 0.0135709  product : 1.03505e-11 ss 0
>    418  best: fs[8] 8  : es[1] 1 ,  a: 0.12501 t: 0.324392 score 0.0405522
>  product : 4.19734e-13 ss 0
>    419  best: fs[9] 9  : es[1] 1 ,  a: 0.333332 t: 0.324392 score 0.10813
>  product : 4.5386e-14 ss 0
>    420  best: fs[10] 10  : es[1] 1 ,  a: 0.999996 t: 0.324392 score
> 0.324391  product : 1.47228e-14 ss 0
>    421  best: fs[11] 11  : es[1] 1 ,  a: 0.999996 t: 0.324392 score
> 0.324391  product : 4.77594e-15 ss 0
>    422  best: fs[12] 12  : es[1] 1 ,  a: 0.999996 t: 0.324392 score
> 0.324391  product : 1.54927e-15 ss 0
>    423  best: fs[13] 13  : es[1] 1 ,  a: 0.999996 t: 0.324392 score
> 0.324391  product : 5.0257e-16 ss 0
>    424  best: fs[14] 14  : es[1] 1 ,  a: 0.999996 t: 0.324392 score
> 0.324391  product : 1.63029e-16 ss 0
>    425  best: fs[15] 15  : es[1] 1 ,  a: 0.999996 t: 0.324392 score
> 0.324391  product : 5.28852e-17 ss 0
>    426  best: fs[16] 16  : es[1] 1 ,  a: 0.999996 t: 0.324392 score
> 0.324391  product : 1.71555e-17 ss 0
>    427  best: fs[17] 17  : es[1] 1 ,  a: 0.999996 t: 0.324392 score
> 0.324391  product : 5.56508e-18 ss 0
>    428  best: fs[18] 18  : es[1] 1 ,  a: 0.999996 t: 0.324392 score
> 0.324391  product : 1.80526e-18 ss 0
>    429  best: fs[19] 19  : es[1] 1 ,  a: 0.999996 t: 0.324392 score
> 0.324391  product : 5.85611e-19 ss 0
>    430  best: fs[20] 20  : es[1] 1 ,  a: 0.999996 t: 0.324392 score
> 0.324391  product : 1.89967e-19 ss 0
>    431  best: fs[21] 21  : es[1] 1 ,  a: 0.999996 t: 0.324392 score
> 0.324391  product : 6.16235e-20 ss 0
>    432  best: fs[22] 22  : es[1] 1 ,  a: 0.999996 t: 0.324392 score
> 0.324391  product : 1.99901e-20 ss 0
>    433  best: fs[23] 23  : es[1] 1 ,  a: 0.999996 t: 0.324392 score
> 0.324391  product : 6.48461e-21 ss 0
>    434  best: fs[24] 24  : es[1] 1 ,  a: 0.999996 t: 0.324392 score
> 0.324391  product : 2.10355e-21 ss 0
>    435  best: fs[25] 25  : es[1] 1 ,  a: 0.999996 t: 0.324392 score
> 0.324391  product : 6.82372e-22 ss 0
>    436  best: fs[26] 26  : es[1] 1 ,  a: 0.999996 t: 0.324392 score
> 0.324391  product : 2.21355e-22 ss 0
>    437  best: fs[27] 27  : es[1] 1 ,  a: 0.999996 t: 0.324392 score
> 0.324391  product : 7.18057e-23 ss 0
>    438  best: fs[28] 28  : es[1] 1 ,  a: 0.999996 t: 0.324392 score
> 0.324391  product : 2.32931e-23 ss 0
>    439  best: fs[29] 29  : es[1] 1 ,  a: 0.999996 t: 0.324392 score
> 0.324391  product : 7.55608e-24 ss 0
>    440  best: fs[30] 30  : es[1] 1 ,  a: 0.999996 t: 0.324392 score
> 0.324391  product : 2.45112e-24 ss 0
>    441  best: fs[31] 31  : es[1] 1 ,  a: 0.999996 t: 0.324392 score
> 0.324391  product : 7.95122e-25 ss 0
>    442  best: fs[32] 32  : es[1] 1 ,  a: 0.999996 t: 0.324392 score
> 0.324391  product : 2.5793e-25 ss 0
>    443  best: fs[33] 33  : es[1] 1 ,  a: 0.999996 t: 0.324392 score
> 0.324391  product : 8.36703e-26 ss 0
>    444  best: fs[34] 34  : es[1] 1 ,  a: 0.999996 t: 0.324392 score
> 0.324391  product : 2.71419e-26 ss 0
>    445  best: fs[35] 35  : es[1] 1 ,  a: 0.99992 t: 0.0101365 score
> 0.0101357  product : 2.75101e-28 ss 0
>    446  Fert[0] selected 9
>    447  Fert[1] selected 9
>    448  Fert[2] selected 0
>    449  Fert[3] selected 9
>    450  Fert[4] selected 8
>    451  10000
>    452  20000
>    453  30000
>    454  40000
>    455  50000
>    456  Reading more sentence pairs into memory ...
>    457  Reading more sentence pairs into memory ...
>    458  #centers(pre/hillclimbed/real): 1 1 1  #al: 1075.58
> #alsophisticatedcountcollection: 0 #hcsteps: 0
>    459  #peggingImprovements: 0
>    460  A/D table contains 104118 parameters.
>    461  A/D table contains 104094 parameters.
>    462  NTable contains 397690 parameter.
>    463  p0_count is 1.09339e+06 and p1 is 113340; p0 is 0.999 p1: 0.001
>    464  THTo3: TRAIN CROSS-ENTROPY 4.26144 PERPLEXITY 19.1788
>    465  THTo3: (1) TRAIN VITERBI CROSS-ENTROPY 4.34002 PERPLEXITY 20.2523
>    466
>    467  THTo3 Viterbi Iteration : 1 took: 44 seconds
>
>
>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to