Re: [Moses-support] Tune set size

Tom Hoar Sun, 05 Oct 2014 03:10:33 -0700

We use a random sample size calculation to determine the optimal samplesize based on each bitext corpussize.http://en.wikipedia.org/wiki/Sample_size_determination<http://en.wikipedia.org/wiki/Sample_size_determination>. In aninteresting choice of words, the wikipedia's introduction states, "Thesample size is an important feature of any empirical study in which thegoal is to make inferences about a population from a sample."

As it turns out, most corpora we encounter, the tuning set sizes fallsomewhere in the middle of the range Philipp suggested, i.e. 2-3K lines.


Tom


On 10/05/2014 04:22 PM, Roee Aharoni wrote:

Hi,
In a recent post it was mentioned that "600k line tuning set is waytoo big. It will take forever. It's better to reduce it to 2-3k lines."Is there a reference to an empirical experiment searching for an"optimal" MERT tune set size?
Thanks,

—
Sent from Mailbox <https://www.dropbox.com/mailbox>


_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Tune set size

Reply via email to