Re: [Moses-support] different bleu scores from nist and moses scripts

Adam Lopez Fri, 19 Mar 2010 12:51:51 -0700

IIRC, the principle difference is the calculation of the brevity
penalty, but there also seem to be some slight differences in
tokenization between the scripts.


On Fri, Mar 19, 2010 at 9:32 AM, Mark Fishel <[email protected]> wrote:
> Dear list,
>
> I am getting different BLEU scores from the NIST mteval script
> (version) and the multi-bleu.perl script within Moses's distribution
> for the same reference and hypothesis translations -- even the
> individual n-gram precisions are different:
>
> BLEU = 16.80, 53.0/26.2/13.4/6.4 (BP=0.905, ratio=0.909, hyp_len=281,
> ref_len=309)
>
> and
>
> BLEU score = 0.1681 for system "x"
>
> Individual N-gram scoring
>        1-gram   2-gram   3-gram   4-gram   5-gram   6-gram   7-gram
> 8-gram   9-gram
>        ------   ------   ------   ------   ------   ------   ------
> ------   ------
>  BLEU:  0.5246   0.2591   0.1326   0.0630   0.0328   0.0213   0.0133
> 0.0046   0.0000  "x"
>
> The files that produced the scores are here: mtj.ut.ee/diffbleu.tgz .
>
> Does everyone else get different scores? Can anyone suggest a reason
> for that? It's not the smoothing of the NIST script, both support UTF8
> i/o, etc; so I'm out of ideas, and before comparing the
> implementations I wanted to ask for opinions.
>
> Thanks in advance,
> Mark
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] different bleu scores from nist and moses scripts

Reply via email to