I often see that one side gets lucky early and over a few hundred games the win rate moves back toward what I expected. Small numbers of games (like a few hundred) can be very misleading.
David > -----Original Message----- > From: computer-go-boun...@dvandva.org [mailto:computer-go- > boun...@dvandva.org] On Behalf Of Vlad Dumitrescu > Sent: Thursday, August 04, 2011 2:02 PM > To: computer-go@dvandva.org > Subject: Re: [Computer-go] testing improvements > > On Thu, Aug 4, 2011 at 22:41, David Fotland <fotl...@smart-games.com> wrote: > > Remember that the confidence interval is two sided, so 3% means plus or > > minus 3%. So 52% win rate is within +- 3% of 50%. > > Yes, of course. What I reacted to was that under the whole test, one > bot always had around 52% wins (well, after some 100 games, at least). > I would have thought it would move around the real value. > > Thanks, > Vlad > > >> -----Original Message----- > >> From: computer-go-boun...@dvandva.org [mailto:computer-go- > >> boun...@dvandva.org] On Behalf Of Vlad Dumitrescu > >> Sent: Thursday, August 04, 2011 1:14 PM > >> To: computer-go@dvandva.org > >> Subject: Re: [Computer-go] testing improvements > >> > >> Hi, > >> > >> On Thu, Aug 4, 2011 at 19:29, David Fotland <fotl...@smart-games.com> > > wrote: > >> > Did each fuego play the same number of games vs gnugo, and did each > play > >> > half its games on each color? > >> > >> Yes, I set up an all-play-all competition with gomill. > >> > >> On Thu, Aug 4, 2011 at 19:55, Erik van der Werf > >> <erikvanderw...@gmail.com> wrote: > >> > On Thu, Aug 4, 2011 at 6:57 PM, Vlad Dumitrescu <vladd...@gmail.com> > >> wrote: > >> > The scores towards gnugo are almost > >> >> identical, but the two fuegos score 449-415, which is 52% and the 95% > >> >> confidence is ~3%, i.e. ~10 ELO. > >> > > >> > That 3% is not a 95% confidence interval, more like 1 standard > >> > deviation... (so nothing with high confidence yet) > >> > >> I took the easy way out and used a formula mentioned by David Fotland > >> on this list for a while ago > >> > >> >There is a simple formula to estimate the confidence interval of a > > result. > >> >I use it to see if a new version is likely better than a reference > > version > >> >(but I use 95% confidence intervals, so over hundred of experiments it > >> gives > >> >me the wrong answer too often). > >> >1.96 * sqrt(wr * (1 - wr) / trials) > >> >Where wr is the win rate of one version vs the reference, and trials is > > the > >> >number of test games. > >> > >> On Thu, Aug 4, 2011 at 20:21, Kahn Jonas <jonas.k...@math.u-psud.fr> > > wrote: > >> > All the more since you're testing the same idea on two bots > >> > simultaneaously. So if you want to be wrong at most five percent of the > >> > time, and consider you are better as soon as one of the bots gets > >> > better, you have to make individual tests at the 2.5% level. > >> > >> At the moment I ran the bots without any modification, to see if > >> everything works fine. So I think that the results between the > >> identical bots should have been closer to 50% or at least to swing > >> sometimes to the other side of 50%. Right now it's 625-566, which is > >> 52,5% and 2.83% confidence according to the formula above. > >> > >> The results are > >> fuego-1.1 v fuego-new (1199/2000 games) > >> unknown results: 1 0.08% > >> board size: 9 komi: 6.5 > >> wins black white avg cpu > >> fuego-1.1 569 47.46% 386 64.33% 183 30.55% 2.69 > >> fuego-new 629 52.46% 415 69.28% 214 35.67% 2.67 > >> 801 66.81% 397 33.11% > >> > >> I realize that statistic results don't always match what one would > >> expect, but this should be a straightforward case... > >> > >> Thanks a lot for all the answers! > >> > >> regards, > >> /Vlad > >> _______________________________________________ > >> Computer-go mailing list > >> Computer-go@dvandva.org > >> http://dvandva.org/cgi-bin/mailman/listinfo/computer-go > > > > _______________________________________________ > > Computer-go mailing list > > Computer-go@dvandva.org > > http://dvandva.org/cgi-bin/mailman/listinfo/computer-go > > > _______________________________________________ > Computer-go mailing list > Computer-go@dvandva.org > http://dvandva.org/cgi-bin/mailman/listinfo/computer-go _______________________________________________ Computer-go mailing list Computer-go@dvandva.org http://dvandva.org/cgi-bin/mailman/listinfo/computer-go