Hi Daniel BLEU scores do vary according to test set, but the scores you report are much higher than usual.
The most likely thing is that you have some of your test set included in your training set, cheers - Barry On Thursday 26 April 2012 19:18:33 Daniel Schaut wrote: > Hi all, > > I'm running some experiments for my thesis and I've been told by a more > experienced user that the achieved scores for BLEU/METEOR of my MT engine > were too good to be true. Since this is the very first MT engine I've ever > made and I am not experienced with interpreting scores, I really don't know > how to reflect them. The first test set achieves a BLEU score of 0.6508 > (v13). METEOR's final score is 0.7055 (v1.3, exact, stem, paraphrase). A > second test set indicated a slightly lower BLEU score of 0.6267 and a > METEOR score of 0.6748. > > Here are some basic facts about my system: > Decoding direction: EN-DE > Training corpus: 1.8 mil sentences > Tuning runs: 5 > Test sets: a) 2,000 sentences, b) 1,000 sentences (both in-domain) > LM type: trigram > TM type: unfactored > > I'm now trying to figure out if these scores are realistic at all, as > different papers indicate by far lower BLEU scores, e.g. Koehn and Hoang > 2011. Any comments regarding the mentioned decoding direction and related > scores will be much appreciated. > > Best, > Daniel > -- Barry Haddow University of Edinburgh +44 (0) 131 651 3173 -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. _______________________________________________ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support