Hi Ondrej, during the metrics task, is your baseline BLEU score also calculated by the NIST script with all its consequences?
Marcin W dniu 10.10.2014 22:28, Ondrej Bojar pisze: > Hi, > > I find only one good thing about the standard NIST script: that it is beyond > the control of anyone of us. ;-) It could be better only if the code were not > available. Then we'd have a truly black box measure. ;-) > > Yes, you're definitely right that mismatch in register should not be > penalized so much, but for WMT translation, we're actually not looking at > automatic scores at all. In some years, I think they were not even reported > in the paper. That's what the metric task is for: to promote metrics that > look at just the right things. > > Cheers, O. > > ----- Original Message ----- >> From: "Marcin Junczys-Dowmunt" <[email protected]> >> To: "Philipp Koehn" <[email protected]> >> Cc: [email protected] >> Sent: Friday, 10 October, 2014 6:59:03 PM >> Subject: Re: [Moses-support] BLEU evaluation at WMT - contractions >> >> Thanks for the quick answer. >> >> I admire the stoicism :) I find it painful to see that contractions are >> not handled by the official script. You get two errors for not hitting >> "we" and "are" when you have "we're" which is actually the same (modulo >> style). Also, I guess the news domain has less issues with contractions >> otherwise you might have heard more complaints. Unfortunately I have to >> provide results in WMT-style, so there is no way around that script. >> METEOR does it right by the way. >> >> W dniu 10.10.2014 17:44, Philipp Koehn pisze: >>> Hi, >>> >>> there are a lot of issues with tokenization. >>> >>> The BLEU scores we report in WMT are using the standard NIST script, >>> which expects detokenized and properly cased output. The script does >>> its own internal tokenization, we just accept that. >>> >>> Another way to compute BLEU scores is with multi-bleu.perl - which >>> completely accepts your tokenization. >>> >>> -phi >>> >>> >>> On Fri, Oct 10, 2014 at 11:12 AM, Marcin Junczys-Dowmunt >>> <[email protected] <mailto:[email protected]>> wrote: >>> >>> Hi, >>> >>> slightly off-topic: I have a question concerning the evaluation >>> practice during WMT. I have noticed that the standard NIST script >>> mteval-v1.3a.pl <http://mteval-v1.3a.pl> (or any other versions) >>> does not split on apostrophes for English contractions. How was >>> this handled during the WMT? Did you use the official NIST scripts >>> for BLEU calculation after detokenization? If yes, this would >>> severely penalize the use of contractions over non-contracted >>> forms (around 2-3% BLEU), is this just generally accepted? >>> >>> Thanks, >>> >>> Marcin >>> >>> >>> _______________________________________________ >>> Moses-support mailing list >>> [email protected] <mailto:[email protected]> >>> http://mailman.mit.edu/mailman/listinfo/moses-support >>> >>> >> >> _______________________________________________ >> Moses-support mailing list >> [email protected] >> http://mailman.mit.edu/mailman/listinfo/moses-support >> _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
