Davood,

I don't know enough about your data and uses cases to recommend one way or another. Running MERT multiple times will give you different BLEU scores, I have never found the deltas to make a difference in a production environment.

Tom


On 10/14/2015 12:50 PM, Davood Mohammadifar wrote:
Thanks Michael for the paper and thanks Tom.

Based on the paper, one solution is replication of MERT and testing at least three times.

My ideas have subtle effects on BLUE. Do you recommend me run MERT and testing three times or more? should i increase the number of sentences for tuning?

my dataset for Persian to English includes:
Training: about 240000 sentences
Tune: 1000 sentences
Test: 1000 sentences

------------------------------------------------------------------------
From: [email protected]
Date: Sun, 11 Oct 2015 12:53:37 +0700
To: [email protected]
Subject: Re: [Moses-support] BLEU score difference about 0.13 for one dataset is normal?

Yes. Each tuning with the same test set will give you small variations in the final BLEU. Yours looks like they're in a normal range.



Date: Sun, 11 Oct 2015 04:23:56 +0000
From: Davood Mohammadifar <[email protected]>
Subject: [Moses-support] BLEU score difference about 0.13 for one
dataset is normal?
To: Moses Support <[email protected]>

Hello every one

I noticed different BLEU scores for same dataset. Also the difference is not so much and is about 0.13.

I trained my dataset and tuned development set for Persian-English translation. after testing, the score was 21.95. For second time i did the same process and obtained 21.82. (my tools were mgiza, mert, ...)

is this difference normal?

My system:
CPU: Core i7-4790K
RAM: 16GB
OS: ubuntu 12.04

Thanks

_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to