Hello,

I've just ran across a problem with the estimation of lexical reordering
tables. In train-factored-phrase-model.perl, the reordering
probabilities are written out using a printf %.5f format specifier.
Given a large enough corpus containing sufficiently small quantities of
a certain phrase in a certain reordering condition, the probability
estimate of this phrase/condition pair can be small enough to be rounded
down to zero despite smoothing. In the decoder, the resulting zero
probabilities probably get converted into a -Infinity logprob, which causes
all kinds of havoc, including lots of phrases with infinite or NaN scores.

Suggested fix: Use %g instead of %.5f everywhere in sub
store_reordering_f and store_reordering_fe.

Is there any place where zero probabilities actually could make sense or
could they be filtered out on loading the tables?

Best,
Christian
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to