I often see that one side gets lucky early and over a few hundred games the
win rate moves back toward what I expected.  Small numbers of games (like a
few hundred) can be very misleading.

David

> -----Original Message-----
> From: computer-go-boun...@dvandva.org [mailto:computer-go-
> boun...@dvandva.org] On Behalf Of Vlad Dumitrescu
> Sent: Thursday, August 04, 2011 2:02 PM
> To: computer-go@dvandva.org
> Subject: Re: [Computer-go] testing improvements
> 
> On Thu, Aug 4, 2011 at 22:41, David Fotland <fotl...@smart-games.com>
wrote:
> > Remember that the confidence interval is two sided, so 3% means plus or
> > minus 3%.  So 52% win rate is within +- 3% of 50%.
> 
> Yes, of course. What I reacted to was that under the whole test, one
> bot always had around 52% wins (well, after some 100 games, at least).
> I would have thought it would move around the real value.
> 
> Thanks,
> Vlad
> 
> >> -----Original Message-----
> >> From: computer-go-boun...@dvandva.org [mailto:computer-go-
> >> boun...@dvandva.org] On Behalf Of Vlad Dumitrescu
> >> Sent: Thursday, August 04, 2011 1:14 PM
> >> To: computer-go@dvandva.org
> >> Subject: Re: [Computer-go] testing improvements
> >>
> >> Hi,
> >>
> >> On Thu, Aug 4, 2011 at 19:29, David Fotland <fotl...@smart-games.com>
> > wrote:
> >> > Did each fuego play the same number of games vs gnugo, and did each
> play
> >> > half its games on each color?
> >>
> >> Yes, I set up an all-play-all competition with gomill.
> >>
> >> On Thu, Aug 4, 2011 at 19:55, Erik van der Werf
> >> <erikvanderw...@gmail.com> wrote:
> >> > On Thu, Aug 4, 2011 at 6:57 PM, Vlad Dumitrescu <vladd...@gmail.com>
> >> wrote:
> >> >  The scores towards gnugo are almost
> >> >> identical, but the two fuegos score 449-415, which is 52% and the
95%
> >> >> confidence is ~3%, i.e. ~10 ELO.
> >> >
> >> > That 3% is not a 95% confidence interval, more like 1 standard
> >> > deviation... (so nothing with high confidence yet)
> >>
> >> I took the easy way out and used a formula mentioned by David Fotland
> >> on this list for a while ago
> >>
> >> >There is a simple formula to estimate the confidence interval of a
> > result.
> >> >I use it to see if a new version is likely better than a reference
> > version
> >> >(but I use 95% confidence intervals, so over hundred of experiments it
> >> gives
> >> >me the wrong answer too often).
> >> >1.96 * sqrt(wr * (1 - wr) / trials)
> >> >Where wr is the win rate of one version vs the reference, and trials
is
> > the
> >> >number of test games.
> >>
> >> On Thu, Aug 4, 2011 at 20:21, Kahn Jonas <jonas.k...@math.u-psud.fr>
> > wrote:
> >> > All the more since you're testing the same idea on two bots
> >> > simultaneaously. So if you want to be wrong at most five percent of
the
> >> > time, and consider you are better as soon as one of the bots gets
> >> > better, you have to make individual tests at the 2.5% level.
> >>
> >> At the moment I ran the bots without any modification, to see if
> >> everything works fine. So I think that the results between the
> >> identical bots should have been closer to 50% or at least to swing
> >> sometimes to the other side of 50%. Right now it's 625-566, which is
> >> 52,5% and  2.83% confidence according to the formula above.
> >>
> >> The results are
> >> fuego-1.1 v fuego-new (1199/2000 games)
> >> unknown results: 1 0.08%
> >> board size: 9   komi: 6.5
> >>             wins              black          white        avg cpu
> >> fuego-1.1    569 47.46%       386 64.33%     183 30.55%      2.69
> >> fuego-new    629 52.46%       415 69.28%     214 35.67%      2.67
> >>                               801 66.81%     397 33.11%
> >>
> >> I realize that statistic results don't always match what one would
> >> expect, but this should be a straightforward case...
> >>
> >> Thanks a lot for all the answers!
> >>
> >> regards,
> >> /Vlad
> >> _______________________________________________
> >> Computer-go mailing list
> >> Computer-go@dvandva.org
> >> http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
> >
> > _______________________________________________
> > Computer-go mailing list
> > Computer-go@dvandva.org
> > http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
> >
> _______________________________________________
> Computer-go mailing list
> Computer-go@dvandva.org
> http://dvandva.org/cgi-bin/mailman/listinfo/computer-go

_______________________________________________
Computer-go mailing list
Computer-go@dvandva.org
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go

Reply via email to