Re: [Moses-support] BLEU evaluation at WMT - contractions

Philipp Koehn Fri, 10 Oct 2014 08:47:02 -0700

Hi,

there are a lot of issues with tokenization.


The BLEU scores we report in WMT are using the standard NIST script,
which expects detokenized and properly cased output. The script does
its own internal tokenization, we just accept that.

Another way to compute BLEU scores is with multi-bleu.perl - which
completely accepts your tokenization.

-phi



On Fri, Oct 10, 2014 at 11:12 AM, Marcin Junczys-Dowmunt <[email protected]
> wrote:

>  Hi,
>
> slightly off-topic: I have a question concerning the evaluation practice
> during WMT. I have noticed that the standard NIST script mteval-v1.3a.pl
> (or any other versions) does not split on apostrophes for English
> contractions. How was this handled during the WMT? Did you use the official
> NIST scripts for BLEU calculation after detokenization? If yes, this would
> severely penalize the use of contractions over non-contracted forms (around
> 2-3% BLEU), is this just generally accepted?
>
> Thanks,
>
> Marcin
>
>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] BLEU evaluation at WMT - contractions

Reply via email to