Re: [Moses-support] statistical significance tests

Germán Sanchis Trilles Thu, 24 Jan 2013 03:41:03 -0800

Hi all,

personally I have an implementation of Koehn's 2004 ACL paper aboutstatistical sifgnificance tests for MT evaluation. It implements both"stand-alone confidence intervals" (sec.5, bootstrap resampling) andpaired bootstrap resampling, if a baseline is given. Right now, itcomputes confidence intervals for both TER and BLEU (including brev.penalty) using modified versions of multi-bleu.perl and tercom.jar whichare packaged into the script itself, so that the resampling is performedon the TER and BLEU counts (instead of the sentences, which is extremelycostly). I have been using it for some years now, so that it should berelatively robust. It implements bootstrap resampling for a given set oftranslations, i.e., it does not take into account optimizer instability.

If it is of any interest to the Moses project, I have no problemwhatsoever donating it to the MT community ;)


Cheers,

Germán

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] statistical significance tests

Reply via email to