We do both, NIST BLEU as well as moses tokenizer.perl followed by moses scorer (the same one as used by mert), they obviously come out differently.
The WMT14 tables somehow report just one of the BLEUs and I'd have to dig it up to see which was that. The WMT13 tables are clear, see http://www.statmt.org/wmt13/pdf/WMT02.pdf, NIST's mteval seems marginally better. O. ----- Original Message ----- > From: "Marcin Junczys-Dowmunt" <[email protected]> > To: "Ondrej Bojar" <[email protected]> > Cc: [email protected] > Sent: Friday, 10 October, 2014 11:30:57 PM > Subject: Re: [Moses-support] BLEU evaluation at WMT - contractions > > Hi Ondrej, > during the metrics task, is your baseline BLEU score also calculated by > the NIST script with all its consequences? > > Marcin > > W dniu 10.10.2014 22:28, Ondrej Bojar pisze: > > Hi, > > > > I find only one good thing about the standard NIST script: that it is > > beyond the control of anyone of us. ;-) It could be better only if the > > code were not available. Then we'd have a truly black box measure. ;-) > > > > Yes, you're definitely right that mismatch in register should not be > > penalized so much, but for WMT translation, we're actually not looking at > > automatic scores at all. In some years, I think they were not even > > reported in the paper. That's what the metric task is for: to promote > > metrics that look at just the right things. > > > > Cheers, O. > > > > ----- Original Message ----- > >> From: "Marcin Junczys-Dowmunt" <[email protected]> > >> To: "Philipp Koehn" <[email protected]> > >> Cc: [email protected] > >> Sent: Friday, 10 October, 2014 6:59:03 PM > >> Subject: Re: [Moses-support] BLEU evaluation at WMT - contractions > >> > >> Thanks for the quick answer. > >> > >> I admire the stoicism :) I find it painful to see that contractions are > >> not handled by the official script. You get two errors for not hitting > >> "we" and "are" when you have "we're" which is actually the same (modulo > >> style). Also, I guess the news domain has less issues with contractions > >> otherwise you might have heard more complaints. Unfortunately I have to > >> provide results in WMT-style, so there is no way around that script. > >> METEOR does it right by the way. > >> > >> W dniu 10.10.2014 17:44, Philipp Koehn pisze: > >>> Hi, > >>> > >>> there are a lot of issues with tokenization. > >>> > >>> The BLEU scores we report in WMT are using the standard NIST script, > >>> which expects detokenized and properly cased output. The script does > >>> its own internal tokenization, we just accept that. > >>> > >>> Another way to compute BLEU scores is with multi-bleu.perl - which > >>> completely accepts your tokenization. > >>> > >>> -phi > >>> > >>> > >>> On Fri, Oct 10, 2014 at 11:12 AM, Marcin Junczys-Dowmunt > >>> <[email protected] <mailto:[email protected]>> wrote: > >>> > >>> Hi, > >>> > >>> slightly off-topic: I have a question concerning the evaluation > >>> practice during WMT. I have noticed that the standard NIST script > >>> mteval-v1.3a.pl <http://mteval-v1.3a.pl> (or any other versions) > >>> does not split on apostrophes for English contractions. How was > >>> this handled during the WMT? Did you use the official NIST scripts > >>> for BLEU calculation after detokenization? If yes, this would > >>> severely penalize the use of contractions over non-contracted > >>> forms (around 2-3% BLEU), is this just generally accepted? > >>> > >>> Thanks, > >>> > >>> Marcin > >>> > >>> > >>> _______________________________________________ > >>> Moses-support mailing list > >>> [email protected] <mailto:[email protected]> > >>> http://mailman.mit.edu/mailman/listinfo/moses-support > >>> > >>> > >> > >> _______________________________________________ > >> Moses-support mailing list > >> [email protected] > >> http://mailman.mit.edu/mailman/listinfo/moses-support > >> > > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support > -- Ondrej Bojar (mailto:[email protected] / [email protected]) http://www.cuni.cz/~obo _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
