Hi, there are a lot of issues with tokenization.
The BLEU scores we report in WMT are using the standard NIST script, which expects detokenized and properly cased output. The script does its own internal tokenization, we just accept that. Another way to compute BLEU scores is with multi-bleu.perl - which completely accepts your tokenization. -phi On Fri, Oct 10, 2014 at 11:12 AM, Marcin Junczys-Dowmunt <[email protected] > wrote: > Hi, > > slightly off-topic: I have a question concerning the evaluation practice > during WMT. I have noticed that the standard NIST script mteval-v1.3a.pl > (or any other versions) does not split on apostrophes for English > contractions. How was this handled during the WMT? Did you use the official > NIST scripts for BLEU calculation after detokenization? If yes, this would > severely penalize the use of contractions over non-contracted forms (around > 2-3% BLEU), is this just generally accepted? > > Thanks, > > Marcin > > > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support > >
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
