So,
I’m currently running 200 games against GnuGo to see if a change to my
program made a difference. But I now wonder if that’s enough games as I ran
the same benchmark with the same code (but a different compiler version)
and received different results:
85.5% wins (171 games of 200) the first
The intervals given by gogui are the standard deviation, not the usual
95% confidence intervals.
For 95% confidence intervals, you have to multiply the standard
deviation by two.
And you still have the 5% chance of not being inside the interval, so
you can still get the occasional
Thank you Remi!
So the 85.5% +/- 2.5 reported by GoGui would be 85.5% +/- 5 for 95% and
85.5% +/- 7.5. Correct?
And thanks for the table. I think that’s good enough for now. I’ve now
figured out how to calculate the std. deviation myself (it is easy) and
with those two tools together I can now
On Tue, Nov 3, 2015 at 2:22 PM, Petr Baudis wrote:
> (The situation is a bit dire if you have limited computing resources.
> I admit that sometimes I didn't follow the above myself in less formal
> exploratory experiments, but at least I tried to look only
> "infrequently", e.g.
Here's Orego's Java code for this:
It involves a "two-tailed test for difference of proportions".
I usually run 500-1000 games in each condition. (The exact number depends
on the hardware available at the time.)
On Tue, Nov 3, 2015 at 5:50 AM, Urban Hafner
wrote:
>
On Tue, Nov 03, 2015 at 09:46:00AM +0100, Rémi Coulom wrote:
> The intervals given by gogui are the standard deviation, not the usual 95%
> confidence intervals.
>
> For 95% confidence intervals, you have to multiply the standard deviation by
> two.
>
> And you still have the 5% chance of not
On Tue, 3 Nov 2015, Urban Hafner wrote:
Thank you Remi!
So the 85.5% +/- 2.5 reported by GoGui would be 85.5% +/- 5 for 95% and 85.5%
+/- 7.5.
Correct?
Correct.
But you do not need that intervals do not overlap for significativity.
You may divide by $\sqrt{2}$ those intervals before testing
Yes, I noticed that too. But luckily that's the one thing I didn't even
consider doing. Running the same number of games feels like the most natural
thing to do anyway.
Von meinem iPhone gesendet
> Am 03.11.2015 um 14:22 schrieb Petr Baudis :
>
>> On Tue, Nov 03, 2015 at