Thanks for the quick answer.

I admire the stoicism :) I find it painful to see that contractions are not handled by the official script. You get two errors for not hitting "we" and "are" when you have "we're" which is actually the same (modulo style). Also, I guess the news domain has less issues with contractions otherwise you might have heard more complaints. Unfortunately I have to provide results in WMT-style, so there is no way around that script. METEOR does it right by the way.

W dniu 10.10.2014 17:44, Philipp Koehn pisze:
Hi,

there are a lot of issues with tokenization.

The BLEU scores we report in WMT are using the standard NIST script,
which expects detokenized and properly cased output. The script does
its own internal tokenization, we just accept that.

Another way to compute BLEU scores is with multi-bleu.perl - which
completely accepts your tokenization.

-phi


On Fri, Oct 10, 2014 at 11:12 AM, Marcin Junczys-Dowmunt <[email protected] <mailto:[email protected]>> wrote:

    Hi,

    slightly off-topic: I have a question concerning the evaluation
    practice during WMT. I have noticed that the standard NIST script
    mteval-v1.3a.pl <http://mteval-v1.3a.pl> (or any other versions)
    does not split on apostrophes for English contractions. How was
    this handled during the WMT? Did you use the official NIST scripts
    for BLEU calculation after detokenization? If yes, this would
    severely penalize the use of contractions over non-contracted
    forms (around 2-3% BLEU), is this just generally accepted?

    Thanks,

    Marcin


    _______________________________________________
    Moses-support mailing list
    [email protected] <mailto:[email protected]>
    http://mailman.mit.edu/mailman/listinfo/moses-support



_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to