We use a random sample size calculation to determine the optimal sample
size based on each bitext corpus
size.http://en.wikipedia.org/wiki/Sample_size_determination
<http://en.wikipedia.org/wiki/Sample_size_determination>. In an
interesting choice of words, the wikipedia's introduction states, "The
sample size is an important feature of any empirical study in which the
goal is to make inferences about a population from a sample."
As it turns out, most corpora we encounter, the tuning set sizes fall
somewhere in the middle of the range Philipp suggested, i.e. 2-3K lines.
Tom
On 10/05/2014 04:22 PM, Roee Aharoni wrote:
Hi,
In a recent post it was mentioned that "600k line tuning set is way
too big. It will take forever. It's better to reduce it to 2-3k lines."
Is there a reference to an empirical experiment searching for an
"optimal" MERT tune set size?
Thanks,
—
Sent from Mailbox <https://www.dropbox.com/mailbox>
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support