Re: [Moses-support] Significance of BLEU using Multi-bleu

Vineet Kashyap Thu, 18 Sep 2008 18:26:59 -0700

Hi Adam

Thanks for the detailed explanation.


I sorted out my training and testing sets and finally
i have managed to get the following scores:

NIST      BLEU     Multi-bleu   IBM-BLEU
6.0819   0.2070      20.70       0.2438

These were evaluated on a testing set of 5082 sentences
as suggested to get reliable scores need to test on more 
data and the system was trained using only 6053 sentences.

Obviously, accuracy can be further improved by tuning and 
dropping unknown words and increasing the corpus size.

It took about 70 minutes for the decoding process 
which i guess can be improved by running moses in parallel.

Now, my question is can such a small corpus suggest BLEU score of 20% accuracy
as i have made sure both training and testing is separate?

Also, what is the range of each of these BLEU systems?

With multi-bleu i guess 100% accuracy will 100 and 
BLEU in general will between 0-1 and multiply by 100 
to get accuracy in percentage.

But NIST, not sure

Thanks again

Vineet   

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Significance of BLEU using Multi-bleu

Reply via email to