Hi,

in the WMT evaluations, we typically have a test set of 3000 sentences,
so you should choose something similar in size. The bulk of the data
should go into training. A tuning set should have 1000-3000 sentences
as well.

-phi

On Sat, Feb 15, 2014 at 5:50 PM, Arefeh Kazemi <[email protected]> wrote:
> Hello,
>
> I want to estimate the bleu score for translating between two languages
> using Moses. I have a corpus with one million sentence pairs which should be
> divided into train, development and test sets. I used 1000 sentences for
> tuning and split the others into train and test sets. By increasing the size
> of the test set, the bleu score will decrease.
> I want to know what is the reasonable size for the train and test data sets
> to get a reliable bleu score?
>
> Regards
> Arefeh
>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to