Davood,
I don't know enough about your data and uses cases to recommend one way
or another. Running MERT multiple times will give you different BLEU
scores, I have never found the deltas to make a difference in a
production environment.
Tom
On 10/14/2015 12:50 PM, Davood Mohammadifar wrote:
Thanks Michael for the paper and thanks Tom.
Based on the paper, one solution is replication of MERT and testing at
least three times.
My ideas have subtle effects on BLUE. Do you recommend me run MERT and
testing three times or more? should i increase the number of sentences
for tuning?
my dataset for Persian to English includes:
Training: about 240000 sentences
Tune: 1000 sentences
Test: 1000 sentences
------------------------------------------------------------------------
From: [email protected]
Date: Sun, 11 Oct 2015 12:53:37 +0700
To: [email protected]
Subject: Re: [Moses-support] BLEU score difference about 0.13 for one
dataset is normal?
Yes. Each tuning with the same test set will give you small variations
in the final BLEU. Yours looks like they're in a normal range.
Date: Sun, 11 Oct 2015 04:23:56 +0000
From: Davood Mohammadifar <[email protected]>
Subject: [Moses-support] BLEU score difference about 0.13 for one
dataset is normal?
To: Moses Support <[email protected]>
Hello every one
I noticed different BLEU scores for same dataset. Also the difference
is not so much and is about 0.13.
I trained my dataset and tuned development set for Persian-English
translation. after testing, the score was 21.95. For second time i did
the same process and obtained 21.82. (my tools were mgiza, mert, ...)
is this difference normal?
My system:
CPU: Core i7-4790K
RAM: 16GB
OS: ubuntu 12.04
Thanks
_______________________________________________ Moses-support mailing
list [email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support