Hi, I think you are right - the first set of numbers are the n-gram precisions for each order of n-gram. The second set are numbers that you get if you take the geometric mean of the n-gram precisions. Hence, the number under 4-gram is the BLEU score.
The BLEU score is traditionally computed for 1-4 grams, the original BLEU paper discusses this. There was the expectation that if machine translation gets better, we should use higher-order BLEU, but we never did. -phi On Wed, Oct 26, 2016 at 12:44 AM, Nat Gillin <nat.gil...@gmail.com> wrote: > Dear Moses community, > > Ah, I found out what the cumulative means. The cumulative scores are the > usual BLEU scores that we report because it includes the order of ngrams > before the order that is desired. > > The only odd numbers from the mteval-v13a.pl are the individual BLEU > scores. Is it right that the individual BLEU scores are the bp * weights * > modified_precision for each order of ngram? Are there corresponding papers > that investigates these numbers? > > Regards, > Nat > > On Tue, Oct 25, 2016 at 12:02 PM, Nat Gillin <nat.gil...@gmail.com> wrote: > >> Dear Moses community, >> >> To make the question clearer: >> >> The question is why does the cumulative score add the brevity penalty >> before taking the exponent at every order of ngram but the individual score >> only takes the brevity penalty into account at >> https://github.com/moses-smt/mosesdecoder/blob/master/scr >> ipts/generic/mteval-v13a.pl#L874 >> >> Any pointers to the papers describing the cumulative score would be nice >> =) >> >> Thanks in advance again, >> Nat >> >> On Tue, Oct 25, 2016 at 11:58 AM, Nat Gillin <nat.gil...@gmail.com> >> wrote: >> >>> Dear Moses Community, >>> >>> When using mteval-13a.pl, we note that the output looks like this: >>> >>> length ratio: 1.07303974221267 (1998/1862), penalty (log): 0 >>> >>> NIST score = 5.0564 BLEU score = 0.2318 for system "Google" >>> >>> >>> # ------------------------------------------------------------ >>> ------------ >>> >>> >>> Individual N-gram scoring >>> >>> 1-gram 2-gram 3-gram 4-gram 5-gram 6-gram 7-gram >>> 8-gram 9-gram >>> >>> ------ ------ ------ ------ ------ ------ ------ >>> ------ ------ >>> >>> NIST: 4.4488 0.5554 0.0477 0.0045 0.0000 0.0000 0.0000 >>> 0.0000 0.0000 "Google" >>> >>> >>> BLEU: 0.5415 0.2972 0.1752 0.1025 0.0626 0.0354 0.0193 >>> 0.0085 0.0017 "Google" >>> >>> >>> # ------------------------------------------------------------ >>> ------------ >>> >>> Cumulative N-gram scoring >>> >>> 1-gram 2-gram 3-gram 4-gram 5-gram 6-gram 7-gram >>> 8-gram 9-gram >>> >>> ------ ------ ------ ------ ------ ------ ------ >>> ------ ------ >>> >>> NIST: 4.4488 5.0043 5.0520 5.0564 5.0564 5.0564 5.0564 >>> 5.0564 5.0564 "Google" >>> >>> >>> BLEU: 0.5415 0.4012 0.3044 0.2318 0.1784 0.1362 0.1031 >>> 0.0754 0.0493 "Google" >>> >>> And at https://github.com/moses-smt/mosesdecoder/blob/master/scr >>> ipts/generic/mteval-v13a.pl#L823, it tries to calculate the cumulative >>> score by accumulate the individual ngram precisions and at each order of >>> ngram add to it and do a normalization before calculating the cumulative >>> score for each order of nrgram. >>> >>> The question is why does it add the brevity penalty? (i.e. $len_score) >>> >>> Also, is this score discussed in any paper? >>> >>> Thanks in advance for the clarifications! >>> >>> Regards, >>> Nat >>> >>> >> > > _______________________________________________ > Moses-support mailing list > Moses-support@mit.edu > http://mailman.mit.edu/mailman/listinfo/moses-support > >
_______________________________________________ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support