On Dec 1, 2008, at 12:23 AM, Mark Boon <[EMAIL PROTECTED]> wrote:


On 30-nov-08, at 16:51, Jason House wrote:

You've claimed to be non-statistical, so I'm hoping the following is useful... You can compute the likelihood that you made an improvement as:
erf(# of standard deviations)
Where # of standard deviations =
(win rate - 0.5)/sqrt(#games)

Erf is ill-defined, and in practice, people use lookup tables to translate between standard deviations and confidence levels. In practice, people set a goal confidence and directly translate it to a number of standard deviations (3.0 for 99.85%). This situation requires the one-tailed p test.

After about 20 or 30 games, this approximation is accurate and can be used for early termination of your test.


Lately I use twogtp for my test runs. It computes the winning percentage and puts a ± value after it in parenthesis. Is that the v alue of one standard deviation? (I had always assumed so.) Even afte r a 1,000 games it stays in the 1.5% neighbourhood.

Sounds like it.


Maybe 20-30 games is usually an accurate approximation. But if you perform tests often, you'll occasionally bump into that unlikely event where what you thought was a big improvement turned out to be no improvement at all. Or the other way around. Only when I see 20+ games with a zero winning percentage do I stop it, assuming I made a mistake.

The 20 or 30 game caveat would really only apply for extreme winning or losing streaks. Up until that point, confidence levels are not as high as one might expect from the approximation.




Mark

_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to