>
Question: do you think it's better to run mert-moses.pl more times
with smaller sets, or fewer times with larger sets?
>

you should run tuning with larger sets, multiple times

no amount of rerunning tuning on a small set will tell you anything

Miles

On 7 November 2011 13:45, Tom Hoar <[email protected]> wrote:
> A recent list thread recommended running mert several times and averaging
> the various non-deterministic results. If we adopt multiple mert tests, I
> want optimize the sizes of the tuning/test set, without taking too many
> segments from the total population.
>
> Currently, we extract statistically significant number of randomly selected
> segments (pairs) for one tuning set and one test set. We calculate a sample
> size with a basic population sampling formula that uses the population size,
> user-selected confidence level and confidence interval (e.g. 97% ±2%). We
> always assume an equal probabilistic proportion (50/50), which I understand
> results in the highest population sample.
>
> Of course, higher confidence levels with tighter intervals result in larger
> tuning/testing sample sizes. Reducing the confidence level, for example to
> 90%, with an interval of ±5%, gives significantly smaller random sample
> sets. Smaller random sample sets are less representative of the overall
> population, but mert-moses.pl runs faster allowing us to evaluate more sets.
>
> Question: do you think it's better to run mert-moses.pl more times with
> smaller sets, or fewer times with larger sets?
>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to