A recent list thread recommended running mert several times and
averaging the various non-deterministic results. If we adopt multiple
mert tests, I want optimize the sizes of the tuning/test set, without
taking too many segments from the total population. 

Currently, we
extract statistically significant number of randomly selected segments
(pairs) for one tuning set and one test set. We calculate a sample size
with a basic population sampling formula that uses the population size,
user-selected confidence level and confidence interval (e.g. 97% ±2%).
We always assume an equal probabilistic proportion (50/50), which I
understand results in the highest population sample. 

Of course, higher
confidence levels with tighter intervals result in larger tuning/testing
sample sizes. Reducing the confidence level, for example to 90%, with an
interval of ±5%, gives significantly smaller random sample sets. Smaller
random sample sets are less representative of the overall population,
but mert-moses.pl runs faster allowing us to evaluate more sets.


Question: do you think it's better to run mert-moses.pl more times
with smaller sets, or fewer times with larger sets?
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to