Hi, in the WMT evaluations, we typically have a test set of 3000 sentences, so you should choose something similar in size. The bulk of the data should go into training. A tuning set should have 1000-3000 sentences as well.
-phi On Sat, Feb 15, 2014 at 5:50 PM, Arefeh Kazemi <[email protected]> wrote: > Hello, > > I want to estimate the bleu score for translating between two languages > using Moses. I have a corpus with one million sentence pairs which should be > divided into train, development and test sets. I used 1000 sentences for > tuning and split the others into train and test sets. By increasing the size > of the test set, the bleu score will decrease. > I want to know what is the reasonable size for the train and test data sets > to get a reliable bleu score? > > Regards > Arefeh > > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support > _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
