Oh, perfect! According to this table mteval with --international-tokenization has slightly better average correlation with human judgment for English. With this option enabled mteval does split contractions, as it surrounds any non-letter character with spaces. Exactly the justification I needed :) Thanks, Ondrej.
Best, Marcin W dniu 10.10.2014 23:51, Ondrej Bojar pisze: > We do both, NIST BLEU as well as moses tokenizer.perl followed by moses > scorer (the same one as used by mert), they obviously come out differently. > > The WMT14 tables somehow report just one of the BLEUs and I'd have to dig it > up to see which was that. The WMT13 tables are clear, see > http://www.statmt.org/wmt13/pdf/WMT02.pdf, NIST's mteval seems marginally > better. > > O. > > ----- Original Message ----- >> From: "Marcin Junczys-Dowmunt" <[email protected]> >> To: "Ondrej Bojar" <[email protected]> >> Cc: [email protected] >> Sent: Friday, 10 October, 2014 11:30:57 PM >> Subject: Re: [Moses-support] BLEU evaluation at WMT - contractions >> >> Hi Ondrej, >> during the metrics task, is your baseline BLEU score also calculated by >> the NIST script with all its consequences? >> >> Marcin >> >> W dniu 10.10.2014 22:28, Ondrej Bojar pisze: >>> Hi, >>> >>> I find only one good thing about the standard NIST script: that it is >>> beyond the control of anyone of us. ;-) It could be better only if the >>> code were not available. Then we'd have a truly black box measure. ;-) >>> >>> Yes, you're definitely right that mismatch in register should not be >>> penalized so much, but for WMT translation, we're actually not looking at >>> automatic scores at all. In some years, I think they were not even >>> reported in the paper. That's what the metric task is for: to promote >>> metrics that look at just the right things. >>> >>> Cheers, O. >>> >>> ----- Original Message ----- >>>> From: "Marcin Junczys-Dowmunt" <[email protected]> >>>> To: "Philipp Koehn" <[email protected]> >>>> Cc: [email protected] >>>> Sent: Friday, 10 October, 2014 6:59:03 PM >>>> Subject: Re: [Moses-support] BLEU evaluation at WMT - contractions >>>> >>>> Thanks for the quick answer. >>>> >>>> I admire the stoicism :) I find it painful to see that contractions are >>>> not handled by the official script. You get two errors for not hitting >>>> "we" and "are" when you have "we're" which is actually the same (modulo >>>> style). Also, I guess the news domain has less issues with contractions >>>> otherwise you might have heard more complaints. Unfortunately I have to >>>> provide results in WMT-style, so there is no way around that script. >>>> METEOR does it right by the way. >>>> >>>> W dniu 10.10.2014 17:44, Philipp Koehn pisze: >>>>> Hi, >>>>> >>>>> there are a lot of issues with tokenization. >>>>> >>>>> The BLEU scores we report in WMT are using the standard NIST script, >>>>> which expects detokenized and properly cased output. The script does >>>>> its own internal tokenization, we just accept that. >>>>> >>>>> Another way to compute BLEU scores is with multi-bleu.perl - which >>>>> completely accepts your tokenization. >>>>> >>>>> -phi >>>>> >>>>> >>>>> On Fri, Oct 10, 2014 at 11:12 AM, Marcin Junczys-Dowmunt >>>>> <[email protected] <mailto:[email protected]>> wrote: >>>>> >>>>> Hi, >>>>> >>>>> slightly off-topic: I have a question concerning the evaluation >>>>> practice during WMT. I have noticed that the standard NIST script >>>>> mteval-v1.3a.pl <http://mteval-v1.3a.pl> (or any other versions) >>>>> does not split on apostrophes for English contractions. How was >>>>> this handled during the WMT? Did you use the official NIST scripts >>>>> for BLEU calculation after detokenization? If yes, this would >>>>> severely penalize the use of contractions over non-contracted >>>>> forms (around 2-3% BLEU), is this just generally accepted? >>>>> >>>>> Thanks, >>>>> >>>>> Marcin >>>>> >>>>> >>>>> _______________________________________________ >>>>> Moses-support mailing list >>>>> [email protected] <mailto:[email protected]> >>>>> http://mailman.mit.edu/mailman/listinfo/moses-support >>>>> >>>>> >>>> _______________________________________________ >>>> Moses-support mailing list >>>> [email protected] >>>> http://mailman.mit.edu/mailman/listinfo/moses-support >>>> >> _______________________________________________ >> Moses-support mailing list >> [email protected] >> http://mailman.mit.edu/mailman/listinfo/moses-support >> _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
