Re: [Moses-support] BLEU evaluation at WMT - contractions

Marcin Junczys-Dowmunt Fri, 10 Oct 2014 22:47:59 -0700

Oh, perfect! According to this table mteval with 
--international-tokenization has slightly better average correlation 
with human judgment for English. With this option enabled mteval does 
split contractions, as it surrounds any non-letter character with 
spaces. Exactly the justification I needed :)
Thanks, Ondrej.


Best,
Marcin

W dniu 10.10.2014 23:51, Ondrej Bojar pisze:
> We do both, NIST BLEU as well as moses tokenizer.perl followed by moses 
> scorer (the same one as used by mert), they obviously come out differently.
>
> The WMT14 tables somehow report just one of the BLEUs and I'd have to dig it 
> up to see which was that. The WMT13 tables are clear, see 
> http://www.statmt.org/wmt13/pdf/WMT02.pdf, NIST's mteval seems marginally 
> better.
>
> O.
>
> ----- Original Message -----
>> From: "Marcin Junczys-Dowmunt" <[email protected]>
>> To: "Ondrej Bojar" <[email protected]>
>> Cc: [email protected]
>> Sent: Friday, 10 October, 2014 11:30:57 PM
>> Subject: Re: [Moses-support] BLEU evaluation at WMT - contractions
>>
>> Hi Ondrej,
>> during the metrics task, is your baseline BLEU score also calculated by
>> the NIST script with all its consequences?
>>
>> Marcin
>>
>> W dniu 10.10.2014 22:28, Ondrej Bojar pisze:
>>> Hi,
>>>
>>> I find only one good thing about the standard NIST script: that it is
>>> beyond the control of anyone of us. ;-) It could be better only if the
>>> code were not available. Then we'd have a truly black box measure. ;-)
>>>
>>> Yes, you're definitely right that mismatch in register should not be
>>> penalized so much, but for WMT translation, we're actually not looking at
>>> automatic scores at all. In some years, I think they were not even
>>> reported in the paper. That's what the metric task is for: to promote
>>> metrics that look at just the right things.
>>>
>>> Cheers, O.
>>>
>>> ----- Original Message -----
>>>> From: "Marcin Junczys-Dowmunt" <[email protected]>
>>>> To: "Philipp Koehn" <[email protected]>
>>>> Cc: [email protected]
>>>> Sent: Friday, 10 October, 2014 6:59:03 PM
>>>> Subject: Re: [Moses-support] BLEU evaluation at WMT - contractions
>>>>
>>>> Thanks for the quick answer.
>>>>
>>>> I admire the stoicism :) I find it painful to see that contractions are
>>>> not handled by the official script. You get two errors for not hitting
>>>> "we" and "are" when you have "we're" which is actually the same (modulo
>>>> style).  Also, I guess the news domain has less issues with contractions
>>>> otherwise you might have heard more complaints. Unfortunately I have to
>>>> provide results in WMT-style, so there is no way around that script.
>>>> METEOR does it right by the way.
>>>>
>>>> W dniu 10.10.2014 17:44, Philipp Koehn pisze:
>>>>> Hi,
>>>>>
>>>>> there are a lot of issues with tokenization.
>>>>>
>>>>> The BLEU scores we report in WMT are using the standard NIST script,
>>>>> which expects detokenized and properly cased output. The script does
>>>>> its own internal tokenization, we just accept that.
>>>>>
>>>>> Another way to compute BLEU scores is with multi-bleu.perl - which
>>>>> completely accepts your tokenization.
>>>>>
>>>>> -phi
>>>>>
>>>>>
>>>>> On Fri, Oct 10, 2014 at 11:12 AM, Marcin Junczys-Dowmunt
>>>>> <[email protected] <mailto:[email protected]>> wrote:
>>>>>
>>>>>       Hi,
>>>>>
>>>>>       slightly off-topic: I have a question concerning the evaluation
>>>>>       practice during WMT. I have noticed that the standard NIST script
>>>>>       mteval-v1.3a.pl <http://mteval-v1.3a.pl> (or any other versions)
>>>>>       does not split on apostrophes for English contractions. How was
>>>>>       this handled during the WMT? Did you use the official NIST scripts
>>>>>       for BLEU calculation after detokenization? If yes, this would
>>>>>       severely penalize the use of contractions over non-contracted
>>>>>       forms (around 2-3% BLEU), is this just generally accepted?
>>>>>
>>>>>       Thanks,
>>>>>
>>>>>       Marcin
>>>>>
>>>>>
>>>>>       _______________________________________________
>>>>>       Moses-support mailing list
>>>>>       [email protected] <mailto:[email protected]>
>>>>>       http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>>
>>>>>
>>>> _______________________________________________
>>>> Moses-support mailing list
>>>> [email protected]
>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>
>> _______________________________________________
>> Moses-support mailing list
>> [email protected]
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] BLEU evaluation at WMT - contractions

Reply via email to