Thanks Michael for the paper and thanks Tom. 

Based on the paper, one solution is replication of MERT and testing at least 
three times. 

My ideas have subtle effects on BLUE. Do you recommend me run MERT and testing 
three times or more? should i increase the number of sentences for tuning?

my dataset for Persian to English includes:
Training: about 240000 sentences
Tune: 1000 sentences
Test: 1000 sentences

From: [email protected]
Date: Sun, 11 Oct 2015 12:53:37 +0700
To: [email protected]
Subject: Re: [Moses-support] BLEU score difference about 0.13 for one   dataset 
is normal?

Yes. Each tuning with the same test set will give you small variations in the 
final BLEU. Yours looks like they're in a normal range. 







Date: Sun, 11 Oct 2015 04:23:56 +0000

From: Davood Mohammadifar <[email protected]>

Subject: [Moses-support] BLEU score difference about 0.13 for one

        dataset is      normal?

To: Moses Support <[email protected]>



Hello every one



I noticed different BLEU scores for same dataset. Also the difference is not so 
much and is about 0.13.



I trained my dataset and tuned development set for Persian-English translation. 
after testing, the score was 21.95. For second time i did the same process and 
obtained 21.82. (my tools were mgiza, mert, ...)



is this difference normal?



My system:

CPU: Core i7-4790K

RAM: 16GB

OS: ubuntu 12.04



Thanks


_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support
                                          
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to