Hi, I find only one good thing about the standard NIST script: that it is beyond the control of anyone of us. ;-) It could be better only if the code were not available. Then we'd have a truly black box measure. ;-)
Yes, you're definitely right that mismatch in register should not be penalized so much, but for WMT translation, we're actually not looking at automatic scores at all. In some years, I think they were not even reported in the paper. That's what the metric task is for: to promote metrics that look at just the right things. Cheers, O. ----- Original Message ----- > From: "Marcin Junczys-Dowmunt" <[email protected]> > To: "Philipp Koehn" <[email protected]> > Cc: [email protected] > Sent: Friday, 10 October, 2014 6:59:03 PM > Subject: Re: [Moses-support] BLEU evaluation at WMT - contractions > > Thanks for the quick answer. > > I admire the stoicism :) I find it painful to see that contractions are > not handled by the official script. You get two errors for not hitting > "we" and "are" when you have "we're" which is actually the same (modulo > style). Also, I guess the news domain has less issues with contractions > otherwise you might have heard more complaints. Unfortunately I have to > provide results in WMT-style, so there is no way around that script. > METEOR does it right by the way. > > W dniu 10.10.2014 17:44, Philipp Koehn pisze: > > Hi, > > > > there are a lot of issues with tokenization. > > > > The BLEU scores we report in WMT are using the standard NIST script, > > which expects detokenized and properly cased output. The script does > > its own internal tokenization, we just accept that. > > > > Another way to compute BLEU scores is with multi-bleu.perl - which > > completely accepts your tokenization. > > > > -phi > > > > > > On Fri, Oct 10, 2014 at 11:12 AM, Marcin Junczys-Dowmunt > > <[email protected] <mailto:[email protected]>> wrote: > > > > Hi, > > > > slightly off-topic: I have a question concerning the evaluation > > practice during WMT. I have noticed that the standard NIST script > > mteval-v1.3a.pl <http://mteval-v1.3a.pl> (or any other versions) > > does not split on apostrophes for English contractions. How was > > this handled during the WMT? Did you use the official NIST scripts > > for BLEU calculation after detokenization? If yes, this would > > severely penalize the use of contractions over non-contracted > > forms (around 2-3% BLEU), is this just generally accepted? > > > > Thanks, > > > > Marcin > > > > > > _______________________________________________ > > Moses-support mailing list > > [email protected] <mailto:[email protected]> > > http://mailman.mit.edu/mailman/listinfo/moses-support > > > > > > > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support > -- Ondrej Bojar (mailto:[email protected] / [email protected]) http://www.cuni.cz/~obo _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
