[Moses-support] Statistical significance test

Baskaran Sankaran Mon, 08 Apr 2013 21:14:15 -0700

Hi group,

I need to compute statistical significance between a pair of system outputs
and I've used the bootstrap resampling script in Moses. Unfortunately the
BLEU scores from this script differs substantially (about 1.5 points short)
than that of standard mteval script. I've also tried applying the same text
normalization routine from mteval into the bootstrap resampling script (and
modified the script bit so that it would normalize both hyps and refs) but
the scores are still different.


The problem is that the moses bootstrap script suggests some system output
to be statistically significant than a baseline (having absolute BLEU
difference of 0.3), but the mteval BLEU score difference between those
systems is only 0.1.

I know multeval is an option, but again the scores are different and it
doesn't do normalization. Any suggestions?

Thanks
- Baskaran

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

[Moses-support] Statistical significance test

Reply via email to