There is a simple formula to estimate the confidence interval of a result. I use it to see if a new version is likely better than a reference version (but I use 95% confidence intervals, so over hundred of experiments it gives me the wrong answer too often).
1.96 * sqrt(wr * (1 - wr) / trials) Where wr is the win rate of one version vs the reference, and trials is the number of test games. For 99% confidence the 1.96 constant is different. I typically run 500 to 5000 test games, which gives a 95% confidence interval of 1 to 4 percent. Then I can calculate the ELO difference at the upper and lower confidence bounds to see the range of ELO differences. David > -----Original Message----- > From: [email protected] [mailto:computer-go- > [email protected]] On Behalf Of Darren Cook > Sent: Thursday, December 02, 2010 9:25 PM > To: [email protected] > Subject: [Computer-go] Elo points, improvements and confidence > > How many games do two programs need to play to be able to say with 95% > confidence that a new feature/bug fix has given a 50 ELO improvement? > > What about 200 ELO? What about 99% confidence? I'm sure there must be a > straightforward equation for this, but google doesn't understand what I > am asking it, and my own statistics knowledge is letting me down. > > TIA, > Darren > > > -- > Darren Cook, Software Researcher/Developer > > http://dcook.org/gobet/ (Shodan Go Bet - who will win?) > http://dcook.org/work/ (About me and my work) > http://dcook.org/blogs.html (My blogs and articles) > _______________________________________________ > Computer-go mailing list > [email protected] > http://dvandva.org/cgi-bin/mailman/listinfo/computer-go _______________________________________________ Computer-go mailing list [email protected] http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
