Greetings,

Does anyone know if there are any tools available for measuring the
statistical significance of Bleu scores? I am aware of the work using
bootstrapping in "Minimum Error RateTraining in Statistical Machine
Translation" [Och, 2003], "Statistical Significance Tests  of Machine
Translation Evaluation" [Koehn, 2004], and "Measuring Confidence
Intervals for the Machine Translation Evaluation Metrics" [Zhang et al,
2004]. What I am looking for is a usable implementation.

Thank you in advance,

Eric Nichols
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to