Greetings, Does anyone know if there are any tools available for measuring the statistical significance of Bleu scores? I am aware of the work using bootstrapping in "Minimum Error RateTraining in Statistical Machine Translation" [Och, 2003], "Statistical Significance Tests of Machine Translation Evaluation" [Koehn, 2004], and "Measuring Confidence Intervals for the Machine Translation Evaluation Metrics" [Zhang et al, 2004]. What I am looking for is a usable implementation.
Thank you in advance, Eric Nichols _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
