Re: [Moses-support] Higher BLEU/METEOR score than usual for EN-DE

Kenneth Heafield Fri, 27 Apr 2012 08:21:36 -0700

Hi,

        Since this is EN-DE, how are you processing German compounds?


Kenneth

On 04/27/2012 07:43 AM, Daniel Schaut wrote:
> Hi guys,
>
> Thank you for your comprehensive comments.
>
> The most likely thing is that you have some of your test set included in
> your training set,
>
> Indeed, there exist some similarities owing to the domain (instruction
> manuals). Typically for all kinds of manuals, you will find a high
> degree of similarities, e.g. on sub-segment level. I extracted the test
> set A and the tuning sets from the whole corpus before training my
> engine to make sure that test set A doesn’t interfere with the training
> set. Hmmm… that’s an epic fail then… Test set B was provided at a much
> later stage, when the training process was already done.
>
> Did you try looking at the sentences ? -- 1,000 is few enough to eyeball
> them. Have you tried the same system with a different corpus ? (e.g.
>
> EuroParl). Have you checked that your test set and your training set do
> not intersect ?
>
> Apart from scoring, I checked almost every sentence in both test sets
> for my thesis. The quality of the outputs is on a moderate level for
> sentences up to 50 words; everything beyond is of lesser quality.
> Especially, sentences up to 20 words are on a good level.
>
> I’ve just prepared a third and fourth test set from the OpenOffice
> corpus files and from another bunch of in-domain files. Regarding OO
> files (2,000 sentences )BLEU is 0.0858 and METEOR is 0.3031. Kind of
> disappointing…
> The fourth test set of 2,000 sentences reveals similar scores compared
> to the other in-domain test sets.
>
> Very short sentences will give you high scores.
>
> This might be truly another related issue for boosting the scores. On
> average, almost half of the sentences in the test set A and B are quit
> short.
>
> To conclude, one could say that I’ve created an engine suitable for a
> specific domain? However, the engine’s performance outside my domain
> equals almost to zero?
>
> Best,
>
> Daniel
>
> *Von:*[email protected] [mailto:[email protected]] *Im Auftrag von
> *Miles Osborne
> *Gesendet:* 26 April 2012 21:17
> *An:* John D Burger
> *Cc:* Daniel Schaut; [email protected]
> *Betreff:* Re: [Moses-support] Higher BLEU/METEOR score than usual for EN-DE
>
> Very short sentences will give you high scores.
>
> Also multiple references will boost them
>
> Miles
>
> On Apr 26, 2012 8:13 PM, "John D Burger" <[email protected]
> <mailto:[email protected]>> wrote:
>
> I =think= I recall that pairwise BLEU scores for human translators are
> usually around 0.50, so anything much better than that is indeed suspect.
>
> - JB
>
> On Apr 26, 2012, at 14:18 , Daniel Schaut wrote:
>
>  > Hi all,
>  >
>  >
>  > I’m running some experiments for my thesis and I’ve been told by a
> more experienced user that the achieved scores for BLEU/METEOR of my MT
> engine were too good to be true. Since this is the very first MT engine
> I’ve ever made and I am not experienced with interpreting scores, I
> really don’t know how to reflect them. The first test set achieves a
> BLEU score of 0.6508 (v13). METEOR’s final score is 0.7055 (v1.3, exact,
> stem, paraphrase). A second test set indicated a slightly lower BLEU
> score of 0.6267 and a METEOR score of 0.6748.
>  >
>  >
>  > Here are some basic facts about my system:
>  >
>  > Decoding direction: EN-DE
>  >
>  > Training corpus: 1.8 mil sentences
>  >
>  > Tuning runs: 5
>  >
>  > Test sets: a) 2,000 sentences, b) 1,000 sentences (both in-domain)
>  >
>  > LM type: trigram
>  >
>  > TM type: unfactored
>  >
>  >
>  > I’m now trying to figure out if these scores are realistic at all, as
> different papers indicate by far lower BLEU scores, e.g. Koehn and Hoang
> 2011. Any comments regarding the mentioned decoding direction and
> related scores will be much appreciated.
>  >
>  >
>  > Best,
>  >
>  > Daniel
>  >
>  > _______________________________________________
>  > Moses-support mailing list
>  > [email protected] <mailto:[email protected]>
>  > http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
> _______________________________________________
> Moses-support mailing list
> [email protected] <mailto:[email protected]>
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Higher BLEU/METEOR score than usual for EN-DE

Reply via email to