There is a simple formula to estimate the confidence interval of a result.
I use it to see if a new version is likely better than a reference version
(but I use 95% confidence intervals, so over hundred of experiments it gives
me the wrong answer too often).

1.96 * sqrt(wr * (1 - wr) / trials)

Where wr is the win rate of one version vs the reference, and trials is the
number of test games.  For 99% confidence the 1.96 constant is different.

I typically run 500 to 5000 test games, which gives a 95% confidence
interval of 1 to 4 percent.  Then I can calculate the ELO difference at the
upper and lower confidence bounds to see the range of ELO differences.

David

> -----Original Message-----
> From: [email protected] [mailto:computer-go-
> [email protected]] On Behalf Of Darren Cook
> Sent: Thursday, December 02, 2010 9:25 PM
> To: [email protected]
> Subject: [Computer-go] Elo points, improvements and confidence
> 
> How many games do two programs need to play to be able to say with 95%
> confidence that a new feature/bug fix has given a 50 ELO improvement?
> 
> What about 200 ELO? What about 99% confidence? I'm sure there must be a
> straightforward equation for this, but google doesn't understand what I
> am asking it, and my own statistics knowledge is letting me down.
> 
> TIA,
> Darren
> 
> 
> --
> Darren Cook, Software Researcher/Developer
> 
> http://dcook.org/gobet/  (Shodan Go Bet - who will win?)
> http://dcook.org/work/ (About me and my work)
> http://dcook.org/blogs.html (My blogs and articles)
> _______________________________________________
> Computer-go mailing list
> [email protected]
> http://dvandva.org/cgi-bin/mailman/listinfo/computer-go

_______________________________________________
Computer-go mailing list
[email protected]
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go

Reply via email to