Thanks for the quick answer.
I admire the stoicism :) I find it painful to see that contractions are
not handled by the official script. You get two errors for not hitting
"we" and "are" when you have "we're" which is actually the same (modulo
style). Also, I guess the news domain has less issues with contractions
otherwise you might have heard more complaints. Unfortunately I have to
provide results in WMT-style, so there is no way around that script.
METEOR does it right by the way.
W dniu 10.10.2014 17:44, Philipp Koehn pisze:
Hi,
there are a lot of issues with tokenization.
The BLEU scores we report in WMT are using the standard NIST script,
which expects detokenized and properly cased output. The script does
its own internal tokenization, we just accept that.
Another way to compute BLEU scores is with multi-bleu.perl - which
completely accepts your tokenization.
-phi
On Fri, Oct 10, 2014 at 11:12 AM, Marcin Junczys-Dowmunt
<[email protected] <mailto:[email protected]>> wrote:
Hi,
slightly off-topic: I have a question concerning the evaluation
practice during WMT. I have noticed that the standard NIST script
mteval-v1.3a.pl <http://mteval-v1.3a.pl> (or any other versions)
does not split on apostrophes for English contractions. How was
this handled during the WMT? Did you use the official NIST scripts
for BLEU calculation after detokenization? If yes, this would
severely penalize the use of contractions over non-contracted
forms (around 2-3% BLEU), is this just generally accepted?
Thanks,
Marcin
_______________________________________________
Moses-support mailing list
[email protected] <mailto:[email protected]>
http://mailman.mit.edu/mailman/listinfo/moses-support
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support