Here's Orego's Java code for this: It involves a "two-tailed test for difference of proportions".
I usually run 500-1000 games in each condition. (The exact number depends on the hardware available at the time.) On Tue, Nov 3, 2015 at 5:50 AM, Urban Hafner <[email protected]> wrote: > Yes, I noticed that too. But luckily that's the one thing I didn't even > consider doing. Running the same number of games feels like the most > natural thing to do anyway. > > Von meinem iPhone gesendet > > > Am 03.11.2015 um 14:22 schrieb Petr Baudis <[email protected]>: > > > >> On Tue, Nov 03, 2015 at 09:46:00AM +0100, Rémi Coulom wrote: > >> The intervals given by gogui are the standard deviation, not the usual > 95% > >> confidence intervals. > >> > >> For 95% confidence intervals, you have to multiply the standard > deviation by > >> two. > >> > >> And you still have the 5% chance of not being inside the interval, so > you > >> can still get the occasional non-overlapping intervals. > >> > >> Likelihood of superiority is an interesting statistical tool: > >> https://chessprogramming.wikispaces.com/LOS+Table > >> > >> For more advanced tools for deciding when to stop testing, there is > SPRT: > >> http://www.open-chess.org/viewtopic.php?f=5&t=2477 > >> https://en.wikipedia.org/wiki/Sequential_probability_ratio_test > > > > An important corollary to this (noted on this list every few years) > > is that in the most naive scenario where your statistical test is just > > SD-based overlap after N games, you should fix your N number of games > > in advance and not rig it by terminating out of schedule. If you look > > at the progress of your playtesting often, you could spot a few moments > > where the intervals do not overlap, enve if in the long run they > > typically would. > > > > (The situation is a bit dire if you have limited computing resources. > > I admit that sometimes I didn't follow the above myself in less formal > > exploratory experiments, but at least I tried to look only > > "infrequently", e.g. single check every few hours, only at "round" > > numbers of playouts, etc. I hope it's not a grave sin.) > > > > -- > > Petr Baudis > > If you have good ideas, good data and fast computers, > > you can do almost anything. -- Geoffrey Hinton > > _______________________________________________ > > Computer-go mailing list > > [email protected] > > http://computer-go.org/mailman/listinfo/computer-go > _______________________________________________ > Computer-go mailing list > [email protected] > http://computer-go.org/mailman/listinfo/computer-go > -- Peter Drake https://sites.google.com/a/lclark.edu/drake/
Significance.java
Description: Binary data
_______________________________________________ Computer-go mailing list [email protected] http://computer-go.org/mailman/listinfo/computer-go
