If you're interested in statistical significant testing, you really
ought to read the Clark et al. (2011) paper
(http://www.cs.cmu.edu/~jhclark/pubs/significance.pdf). We showed that
the Koehn technique and related methods can indicate significance for
reasons that have little to do with the experimental manipulation that
is being tested--in particular, each time MERT (or virtually any other
optimizer) is run, you get a different system out, and these
differences can be "significant". With a bit more work, it is possible
to control for these effects, but there is no easy fix for the
statistical reliability problem in MT in general.  We are are
experimenting on top of a very unstable foundation. When it's
practical, hypothesis testing can help, but it is more important that
we, as a field, understand the limits of what it can do.
Best,
Chris

On Thu, Jan 24, 2013 at 6:42 AM, Germán Sanchis Trilles
<[email protected]> wrote:
> Hi all,
>
> personally I have an implementation of Koehn's 2004 ACL paper about
> statistical sifgnificance tests for MT evaluation. It implements both
> "stand-alone confidence intervals" (sec.5, bootstrap resampling) and paired
> bootstrap resampling, if a baseline is given. Right now, it computes
> confidence intervals for both TER and BLEU (including brev. penalty) using
> modified versions of multi-bleu.perl and tercom.jar which are packaged into
> the script itself, so that the resampling is performed on the TER and BLEU
> counts (instead of the sentences, which is extremely costly). I have been
> using it for some years now, so that it should be relatively robust. It
> implements bootstrap resampling for a given set of translations, i.e., it
> does not take into account optimizer instability.
>
> If it is of any interest to the Moses project, I have no problem whatsoever
> donating it to the MT community ;)
>
> Cheers,
>
> Germán
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to