If you're interested in statistical significant testing, you really ought to read the Clark et al. (2011) paper (http://www.cs.cmu.edu/~jhclark/pubs/significance.pdf). We showed that the Koehn technique and related methods can indicate significance for reasons that have little to do with the experimental manipulation that is being tested--in particular, each time MERT (or virtually any other optimizer) is run, you get a different system out, and these differences can be "significant". With a bit more work, it is possible to control for these effects, but there is no easy fix for the statistical reliability problem in MT in general. We are are experimenting on top of a very unstable foundation. When it's practical, hypothesis testing can help, but it is more important that we, as a field, understand the limits of what it can do. Best, Chris
On Thu, Jan 24, 2013 at 6:42 AM, Germán Sanchis Trilles <[email protected]> wrote: > Hi all, > > personally I have an implementation of Koehn's 2004 ACL paper about > statistical sifgnificance tests for MT evaluation. It implements both > "stand-alone confidence intervals" (sec.5, bootstrap resampling) and paired > bootstrap resampling, if a baseline is given. Right now, it computes > confidence intervals for both TER and BLEU (including brev. penalty) using > modified versions of multi-bleu.perl and tercom.jar which are packaged into > the script itself, so that the resampling is performed on the TER and BLEU > counts (instead of the sentences, which is extremely costly). I have been > using it for some years now, so that it should be relatively robust. It > implements bootstrap resampling for a given set of translations, i.e., it > does not take into account optimizer instability. > > If it is of any interest to the Moses project, I have no problem whatsoever > donating it to the MT community ;) > > Cheers, > > Germán > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support > _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
