The intervals given by gogui are the standard deviation, not the usual
95% confidence intervals.
For 95% confidence intervals, you have to multiply the standard
deviation by two.
And you still have the 5% chance of not being inside the interval, so
you can still get the occasional non-overlapping intervals.
Likelihood of superiority is an interesting statistical tool:
https://chessprogramming.wikispaces.com/LOS+Table
For more advanced tools for deciding when to stop testing, there is SPRT:
http://www.open-chess.org/viewtopic.php?f=5&t=2477
https://en.wikipedia.org/wiki/Sequential_probability_ratio_test
Rémi
On 11/03/2015 09:38 AM, Urban Hafner wrote:
So,
I’m currently running 200 games against GnuGo to see if a change to my
program made a difference. But I now wonder if that’s enough games as
I ran the same benchmark with the same code (but a different compiler
version) and received different results:
85.5% wins (171 games of 200) the first time (+/- 2.5 according to
gogui-twogtp)
79.0% wins (158 games of 200) the second time (+/- 2.9 according to
gogui-twogtp)
Looking at these results would make me believe that the difference is
significant (the intervals don’t overlap) but then the real difference
is only 13 wins …
My statistics knowledge is sketchy at best but assuming that what
gogui-twogtp calculates is the 95% confidence interval (I’m pretty
sure I’m mixing terms here) it could well be that the difference
between the two runs above is just random.
So, this leads me to two questions:
1. How many games do you normally run to test if a change is
significant “enough”?
2. Any good resources on how to calculate these statistics (i.e. if I
wanted to find the error margin for a 99% confidence interval)?
Urban
--
Blog: http://bettong.net/
Twitter: https://twitter.com/ujh
Homepage: http://www.urbanhafner.com/
_______________________________________________
Computer-go mailing list
[email protected]
http://computer-go.org/mailman/listinfo/computer-go
_______________________________________________
Computer-go mailing list
[email protected]
http://computer-go.org/mailman/listinfo/computer-go