Re: [Computer-go] Master Thesis: Information Sharing in MCTS

2011-08-04 Thread Jonathan Chetwynd

Petr,

a huge congratulations!

chetwynd not chetwyng, but hey had no expectation of a plug.

looking forward to the browse.

best

~:

On 3 Aug 2011, at 23:13, Petr Baudis wrote:


 Hi!

 If anyone is interested, you can read my master thesis at:

http://pasky.or.cz/go/prace.pdf

 It could give a good introduction to current Monte Carlo techniques
in Computer Go in general, and discusses some approaches for  
improvement
(nothing too dramatic). It also gives a mid-level technical  
description

of Pachi (with some important stuff left out, but we are preparing
a paper).

 Kind regards,

--
Petr Pasky Baudis
UNIX is user friendly, it's just picky about who its friends are.
___
Computer-go mailing list
Computer-go@dvandva.org
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go


___
Computer-go mailing list
Computer-go@dvandva.org
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go


Re: [Computer-go] Master Thesis: Information Sharing in MCTS

2011-08-04 Thread David Fotland
Great thesis.  Many Faces also uses rule-based playouts, so Pachi is not the
only rule-based strong program.

You mention that in the playouts you check ataris and extensions to avoid
growing a losing ladder.  Do you do a full ladder search, or just some local
heuristics?  Many Faces does not have any ladder search in the playouts.

David

 -Original Message-
 From: computer-go-boun...@dvandva.org [mailto:computer-go-
 boun...@dvandva.org] On Behalf Of Petr Baudis
 Sent: Wednesday, August 03, 2011 3:13 PM
 To: computer...@computer-go.org
 Subject: [Computer-go] Master Thesis: Information Sharing in MCTS
 
   Hi!
 
   If anyone is interested, you can read my master thesis at:
 
   http://pasky.or.cz/go/prace.pdf
 
   It could give a good introduction to current Monte Carlo techniques
 in Computer Go in general, and discusses some approaches for improvement
 (nothing too dramatic). It also gives a mid-level technical description
 of Pachi (with some important stuff left out, but we are preparing
 a paper).
 
   Kind regards,
 
 --
   Petr Pasky Baudis
 UNIX is user friendly, it's just picky about who its friends are.
 ___
 Computer-go mailing list
 Computer-go@dvandva.org
 http://dvandva.org/cgi-bin/mailman/listinfo/computer-go

___
Computer-go mailing list
Computer-go@dvandva.org
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go


Re: [Computer-go] European Go Congress 2012

2011-08-04 Thread Nick Wedd

Hello Ingo,

I would have liked to travel to Bordeaux for the computer Go, but it did 
not fit well with other things, also Olivier thought I could help just 
as much from my home in England.


However I would like to help with the events in Bonn.  Is the schedule 
for the computer Go known yet?  I could attend for either week, but the 
middle weekend, August 27-29, will be difficult for me.


Best wishes,
Nick

On 21/07/2011 15:40, Ingo Althöfer wrote:

Hello,

right now the European Go Congress 2011 is to start -
in Bordeaux, France. At the same time preparations
are running already for the Congress 2012, which will
take place in Bonn, Germany, in July and August 2012.

On the website of the EGC-2012 organizers, I am writing
a column on computer go, in irregular intervals.
Currently there is one entry only, from early June.
http://www.egc2012.eu/home/news/2011/06/04/ingos-computer-go-column-times-are-hot-computer-go

When someone here has interesting material or ideas or questions
for that column, please let me know.

Ingo.



--
Nick Wedd
n...@maproom.co.uk
___
Computer-go mailing list
Computer-go@dvandva.org
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go

Re: [Computer-go] Master Thesis: Information Sharing in MCTS

2011-08-04 Thread Petr Baudis
On Thu, Aug 04, 2011 at 12:46:21AM -0700, David Fotland wrote:
 You mention that in the playouts you check ataris and extensions to avoid
 growing a losing ladder.  Do you do a full ladder search, or just some local
 heuristics?  Many Faces does not have any ladder search in the playouts.

We do a full ladder search in the sense that we walk the board up to a
ladderbreaker. However, we do not actually play the moves and the ladder
breaker test is very simple.

Overall, this check takes little time, but it also is not very important
strength-wise. Pachi still tends to misplay many ladders and we have not
found a complete solution for that. (I'm very reluctant to completely
prune moves on principle.)

-- 
Petr Pasky Baudis
UNIX is user friendly, it's just picky about who its friends are.
___
Computer-go mailing list
Computer-go@dvandva.org
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go


Re: [Computer-go] European Go Congress 2012

2011-08-04 Thread Ingo Althöfer
Answer given in private mail.
Ingo.


 Original-Nachricht 
 Datum: Thu, 04 Aug 2011 09:03:32 +0100
 Von: Nick Wedd n...@maproom.co.uk
 An: computer-go@dvandva.org
 Betreff: Re: [Computer-go] European Go Congress 2012

 Hello Ingo,
 
 I would have liked to travel to Bordeaux for the computer Go, but it did 
 not fit well with other things, also Olivier thought I could help just 
 as much from my home in England.
 
 However I would like to help with the events in Bonn.  Is the schedule 
 for the computer Go known yet?  I could attend for either week, but the 
 middle weekend, August 27-29, will be difficult for me.
 
 Best wishes,
 Nick
 
 On 21/07/2011 15:40, Ingo Althöfer wrote:
  Hello,
 
  right now the European Go Congress 2011 is to start -
  in Bordeaux, France. At the same time preparations
  are running already for the Congress 2012, which will
  take place in Bonn, Germany, in July and August 2012.
 
  On the website of the EGC-2012 organizers, I am writing
  a column on computer go, in irregular intervals.
  Currently there is one entry only, from early June.
 
 http://www.egc2012.eu/home/news/2011/06/04/ingos-computer-go-column-times-are-hot-computer-go
 
  When someone here has interesting material or ideas or questions
  for that column, please let me know.
 
  Ingo.
 
 
 -- 
 Nick Wedd
 n...@maproom.co.uk
 ___
 Computer-go mailing list
 Computer-go@dvandva.org
 http://dvandva.org/cgi-bin/mailman/listinfo/computer-go

-- 
NEU: FreePhone - 0ct/min Handyspartarif mit Geld-zurück-Garantie!   
Jetzt informieren: http://www.gmx.net/de/go/freephone
___
Computer-go mailing list
Computer-go@dvandva.org
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go

[Computer-go] testing improvements

2011-08-04 Thread Vlad Dumitrescu
Hi all!

I finally got an idea that is worth investigating. Luckily, it is
something that can be tested by modifying existing programs and so I
started to set up an environment to test it. In order to have a
reference, I started this morning a small tournament with two
identical versions of fuego (@1k) and gnugo (@level 10) and they
played 863 rounds so far. The scores towards gnugo are almost
identical, but the two fuegos score 449-415, which is 52% and the 95%
confidence is ~3%, i.e. ~10 ELO. Now this is within limits, and it
varies a bit, but it is always on the side of one of the instances,
never less than 51.5%.

Is this something normal?

Sorry for the n00b question :-)

best regards,
Vlad
___
Computer-go mailing list
Computer-go@dvandva.org
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go


Re: [Computer-go] testing improvements

2011-08-04 Thread David Fotland
Did each fuego play the same number of games vs gnugo, and did each play
half its games on each color?

 -Original Message-
 From: computer-go-boun...@dvandva.org [mailto:computer-go-
 boun...@dvandva.org] On Behalf Of Vlad Dumitrescu
 Sent: Thursday, August 04, 2011 9:57 AM
 To: computer-go@dvandva.org
 Subject: [Computer-go] testing improvements
 
 Hi all!
 
 I finally got an idea that is worth investigating. Luckily, it is
 something that can be tested by modifying existing programs and so I
 started to set up an environment to test it. In order to have a
 reference, I started this morning a small tournament with two
 identical versions of fuego (@1k) and gnugo (@level 10) and they
 played 863 rounds so far. The scores towards gnugo are almost
 identical, but the two fuegos score 449-415, which is 52% and the 95%
 confidence is ~3%, i.e. ~10 ELO. Now this is within limits, and it
 varies a bit, but it is always on the side of one of the instances,
 never less than 51.5%.
 
 Is this something normal?
 
 Sorry for the n00b question :-)
 
 best regards,
 Vlad
 ___
 Computer-go mailing list
 Computer-go@dvandva.org
 http://dvandva.org/cgi-bin/mailman/listinfo/computer-go

___
Computer-go mailing list
Computer-go@dvandva.org
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go


Re: [Computer-go] testing improvements

2011-08-04 Thread Erik van der Werf
On Thu, Aug 4, 2011 at 6:57 PM, Vlad Dumitrescu vladd...@gmail.com wrote:
 The scores towards gnugo are almost
 identical, but the two fuegos score 449-415, which is 52% and the 95%
 confidence is ~3%, i.e. ~10 ELO.

That 3% is not a 95% confidence interval, more like 1 standard
deviation... (so nothing with high confidence yet)

Erik
___
Computer-go mailing list
Computer-go@dvandva.org
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go


Re: [Computer-go] testing improvements

2011-08-04 Thread Kahn Jonas

All the more since you're testing the same idea on two bots
simultaneaously. So if you want to be wrong at most five percent of the
time, and consider you are better as soon as one of the bots gets
better, you have to make individual tests at the 2.5% level.

And I'm not even taking into account the fact that you want to continue
testing till you reach significance. That would again require you take a
lower level.

Jonas



On Thu, Aug 4, 2011 at 6:57 PM, Vlad Dumitrescu vladd...@gmail.com wrote:
The scores towards gnugo are almost

identical, but the two fuegos score 449-415, which is 52% and the 95%
confidence is ~3%, i.e. ~10 ELO.


That 3% is not a 95% confidence interval, more like 1 standard
deviation... (so nothing with high confidence yet)

Erik
___
Computer-go mailing list
Computer-go@dvandva.org
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go


___
Computer-go mailing list
Computer-go@dvandva.org
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go


Re: [Computer-go] testing improvements

2011-08-04 Thread Vlad Dumitrescu
Hi,

On Thu, Aug 4, 2011 at 19:29, David Fotland fotl...@smart-games.com wrote:
 Did each fuego play the same number of games vs gnugo, and did each play
 half its games on each color?

Yes, I set up an all-play-all competition with gomill.

On Thu, Aug 4, 2011 at 19:55, Erik van der Werf
erikvanderw...@gmail.com wrote:
 On Thu, Aug 4, 2011 at 6:57 PM, Vlad Dumitrescu vladd...@gmail.com wrote:
  The scores towards gnugo are almost
 identical, but the two fuegos score 449-415, which is 52% and the 95%
 confidence is ~3%, i.e. ~10 ELO.

 That 3% is not a 95% confidence interval, more like 1 standard
 deviation... (so nothing with high confidence yet)

I took the easy way out and used a formula mentioned by David Fotland
on this list for a while ago

There is a simple formula to estimate the confidence interval of a result.
I use it to see if a new version is likely better than a reference version
(but I use 95% confidence intervals, so over hundred of experiments it gives
me the wrong answer too often).
1.96 * sqrt(wr * (1 - wr) / trials)
Where wr is the win rate of one version vs the reference, and trials is the
number of test games.

On Thu, Aug 4, 2011 at 20:21, Kahn Jonas jonas.k...@math.u-psud.fr wrote:
 All the more since you're testing the same idea on two bots
 simultaneaously. So if you want to be wrong at most five percent of the
 time, and consider you are better as soon as one of the bots gets
 better, you have to make individual tests at the 2.5% level.

At the moment I ran the bots without any modification, to see if
everything works fine. So I think that the results between the
identical bots should have been closer to 50% or at least to swing
sometimes to the other side of 50%. Right now it's 625-566, which is
52,5% and  2.83% confidence according to the formula above.

The results are
fuego-1.1 v fuego-new (1199/2000 games)
unknown results: 1 0.08%
board size: 9   komi: 6.5
wins  black  whiteavg cpu
fuego-1.1569 47.46%   386 64.33% 183 30.55%  2.69
fuego-new629 52.46%   415 69.28% 214 35.67%  2.67
  801 66.81% 397 33.11%

I realize that statistic results don't always match what one would
expect, but this should be a straightforward case...

Thanks a lot for all the answers!

regards,
/Vlad
___
Computer-go mailing list
Computer-go@dvandva.org
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go


Re: [Computer-go] testing improvements

2011-08-04 Thread David Fotland
Remember that the confidence interval is two sided, so 3% means plus or
minus 3%.  So 52% win rate is within +- 3% of 50%.

David

 -Original Message-
 From: computer-go-boun...@dvandva.org [mailto:computer-go-
 boun...@dvandva.org] On Behalf Of Vlad Dumitrescu
 Sent: Thursday, August 04, 2011 1:14 PM
 To: computer-go@dvandva.org
 Subject: Re: [Computer-go] testing improvements
 
 Hi,
 
 On Thu, Aug 4, 2011 at 19:29, David Fotland fotl...@smart-games.com
wrote:
  Did each fuego play the same number of games vs gnugo, and did each play
  half its games on each color?
 
 Yes, I set up an all-play-all competition with gomill.
 
 On Thu, Aug 4, 2011 at 19:55, Erik van der Werf
 erikvanderw...@gmail.com wrote:
  On Thu, Aug 4, 2011 at 6:57 PM, Vlad Dumitrescu vladd...@gmail.com
 wrote:
   The scores towards gnugo are almost
  identical, but the two fuegos score 449-415, which is 52% and the 95%
  confidence is ~3%, i.e. ~10 ELO.
 
  That 3% is not a 95% confidence interval, more like 1 standard
  deviation... (so nothing with high confidence yet)
 
 I took the easy way out and used a formula mentioned by David Fotland
 on this list for a while ago
 
 There is a simple formula to estimate the confidence interval of a
result.
 I use it to see if a new version is likely better than a reference
version
 (but I use 95% confidence intervals, so over hundred of experiments it
 gives
 me the wrong answer too often).
 1.96 * sqrt(wr * (1 - wr) / trials)
 Where wr is the win rate of one version vs the reference, and trials is
the
 number of test games.
 
 On Thu, Aug 4, 2011 at 20:21, Kahn Jonas jonas.k...@math.u-psud.fr
wrote:
  All the more since you're testing the same idea on two bots
  simultaneaously. So if you want to be wrong at most five percent of the
  time, and consider you are better as soon as one of the bots gets
  better, you have to make individual tests at the 2.5% level.
 
 At the moment I ran the bots without any modification, to see if
 everything works fine. So I think that the results between the
 identical bots should have been closer to 50% or at least to swing
 sometimes to the other side of 50%. Right now it's 625-566, which is
 52,5% and  2.83% confidence according to the formula above.
 
 The results are
 fuego-1.1 v fuego-new (1199/2000 games)
 unknown results: 1 0.08%
 board size: 9   komi: 6.5
 wins  black  whiteavg cpu
 fuego-1.1569 47.46%   386 64.33% 183 30.55%  2.69
 fuego-new629 52.46%   415 69.28% 214 35.67%  2.67
   801 66.81% 397 33.11%
 
 I realize that statistic results don't always match what one would
 expect, but this should be a straightforward case...
 
 Thanks a lot for all the answers!
 
 regards,
 /Vlad
 ___
 Computer-go mailing list
 Computer-go@dvandva.org
 http://dvandva.org/cgi-bin/mailman/listinfo/computer-go

___
Computer-go mailing list
Computer-go@dvandva.org
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go


Re: [Computer-go] testing improvements

2011-08-04 Thread Vlad Dumitrescu
On Thu, Aug 4, 2011 at 22:41, David Fotland fotl...@smart-games.com wrote:
 Remember that the confidence interval is two sided, so 3% means plus or
 minus 3%.  So 52% win rate is within +- 3% of 50%.

Yes, of course. What I reacted to was that under the whole test, one
bot always had around 52% wins (well, after some 100 games, at least).
I would have thought it would move around the real value.

Thanks,
Vlad

 -Original Message-
 From: computer-go-boun...@dvandva.org [mailto:computer-go-
 boun...@dvandva.org] On Behalf Of Vlad Dumitrescu
 Sent: Thursday, August 04, 2011 1:14 PM
 To: computer-go@dvandva.org
 Subject: Re: [Computer-go] testing improvements

 Hi,

 On Thu, Aug 4, 2011 at 19:29, David Fotland fotl...@smart-games.com
 wrote:
  Did each fuego play the same number of games vs gnugo, and did each play
  half its games on each color?

 Yes, I set up an all-play-all competition with gomill.

 On Thu, Aug 4, 2011 at 19:55, Erik van der Werf
 erikvanderw...@gmail.com wrote:
  On Thu, Aug 4, 2011 at 6:57 PM, Vlad Dumitrescu vladd...@gmail.com
 wrote:
   The scores towards gnugo are almost
  identical, but the two fuegos score 449-415, which is 52% and the 95%
  confidence is ~3%, i.e. ~10 ELO.
 
  That 3% is not a 95% confidence interval, more like 1 standard
  deviation... (so nothing with high confidence yet)

 I took the easy way out and used a formula mentioned by David Fotland
 on this list for a while ago

 There is a simple formula to estimate the confidence interval of a
 result.
 I use it to see if a new version is likely better than a reference
 version
 (but I use 95% confidence intervals, so over hundred of experiments it
 gives
 me the wrong answer too often).
 1.96 * sqrt(wr * (1 - wr) / trials)
 Where wr is the win rate of one version vs the reference, and trials is
 the
 number of test games.

 On Thu, Aug 4, 2011 at 20:21, Kahn Jonas jonas.k...@math.u-psud.fr
 wrote:
  All the more since you're testing the same idea on two bots
  simultaneaously. So if you want to be wrong at most five percent of the
  time, and consider you are better as soon as one of the bots gets
  better, you have to make individual tests at the 2.5% level.

 At the moment I ran the bots without any modification, to see if
 everything works fine. So I think that the results between the
 identical bots should have been closer to 50% or at least to swing
 sometimes to the other side of 50%. Right now it's 625-566, which is
 52,5% and  2.83% confidence according to the formula above.

 The results are
 fuego-1.1 v fuego-new (1199/2000 games)
 unknown results: 1 0.08%
 board size: 9   komi: 6.5
             wins              black          white        avg cpu
 fuego-1.1    569 47.46%       386 64.33%     183 30.55%      2.69
 fuego-new    629 52.46%       415 69.28%     214 35.67%      2.67
                               801 66.81%     397 33.11%

 I realize that statistic results don't always match what one would
 expect, but this should be a straightforward case...

 Thanks a lot for all the answers!

 regards,
 /Vlad
 ___
 Computer-go mailing list
 Computer-go@dvandva.org
 http://dvandva.org/cgi-bin/mailman/listinfo/computer-go

 ___
 Computer-go mailing list
 Computer-go@dvandva.org
 http://dvandva.org/cgi-bin/mailman/listinfo/computer-go

___
Computer-go mailing list
Computer-go@dvandva.org
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go


Re: [Computer-go] testing improvements

2011-08-04 Thread Don Dailey
On Thu, Aug 4, 2011 at 4:41 PM, David Fotland fotl...@smart-games.comwrote:

 Remember that the confidence interval is two sided, so 3% means plus or
 minus 3%.  So 52% win rate is within +- 3% of 50%.


Yes.

And something else that is rarely considered is that the error margin does
not mean what you think it does if you pick and choose when to observe it.
 For example you don't just stop the test because you like the current
result and error margin.

The correct way to interpret the error margin is to decided in advance
exactly how many games you are going to play - and then the error margins
mean what it is supposed to mean (also considering that it is two sided that
is.)

When I test I use bayeselo and I set the confidence to 99% instead of the
standard 95%  because we are not as strict as we should be about this stuff
(since due to limited resources we must be able to stop tests early.)But
we are at least aware of the problems and issue.

Don




 David

  -Original Message-
  From: computer-go-boun...@dvandva.org [mailto:computer-go-
  boun...@dvandva.org] On Behalf Of Vlad Dumitrescu
  Sent: Thursday, August 04, 2011 1:14 PM
  To: computer-go@dvandva.org
  Subject: Re: [Computer-go] testing improvements
 
  Hi,
 
  On Thu, Aug 4, 2011 at 19:29, David Fotland fotl...@smart-games.com
 wrote:
   Did each fuego play the same number of games vs gnugo, and did each
 play
   half its games on each color?
 
  Yes, I set up an all-play-all competition with gomill.
 
  On Thu, Aug 4, 2011 at 19:55, Erik van der Werf
  erikvanderw...@gmail.com wrote:
   On Thu, Aug 4, 2011 at 6:57 PM, Vlad Dumitrescu vladd...@gmail.com
  wrote:
The scores towards gnugo are almost
   identical, but the two fuegos score 449-415, which is 52% and the 95%
   confidence is ~3%, i.e. ~10 ELO.
  
   That 3% is not a 95% confidence interval, more like 1 standard
   deviation... (so nothing with high confidence yet)
 
  I took the easy way out and used a formula mentioned by David Fotland
  on this list for a while ago
 
  There is a simple formula to estimate the confidence interval of a
 result.
  I use it to see if a new version is likely better than a reference
 version
  (but I use 95% confidence intervals, so over hundred of experiments it
  gives
  me the wrong answer too often).
  1.96 * sqrt(wr * (1 - wr) / trials)
  Where wr is the win rate of one version vs the reference, and trials is
 the
  number of test games.
 
  On Thu, Aug 4, 2011 at 20:21, Kahn Jonas jonas.k...@math.u-psud.fr
 wrote:
   All the more since you're testing the same idea on two bots
   simultaneaously. So if you want to be wrong at most five percent of the
   time, and consider you are better as soon as one of the bots gets
   better, you have to make individual tests at the 2.5% level.
 
  At the moment I ran the bots without any modification, to see if
  everything works fine. So I think that the results between the
  identical bots should have been closer to 50% or at least to swing
  sometimes to the other side of 50%. Right now it's 625-566, which is
  52,5% and  2.83% confidence according to the formula above.
 
  The results are
  fuego-1.1 v fuego-new (1199/2000 games)
  unknown results: 1 0.08%
  board size: 9   komi: 6.5
  wins  black  whiteavg cpu
  fuego-1.1569 47.46%   386 64.33% 183 30.55%  2.69
  fuego-new629 52.46%   415 69.28% 214 35.67%  2.67
801 66.81% 397 33.11%
 
  I realize that statistic results don't always match what one would
  expect, but this should be a straightforward case...
 
  Thanks a lot for all the answers!
 
  regards,
  /Vlad
  ___
  Computer-go mailing list
  Computer-go@dvandva.org
  http://dvandva.org/cgi-bin/mailman/listinfo/computer-go

 ___
 Computer-go mailing list
 Computer-go@dvandva.org
 http://dvandva.org/cgi-bin/mailman/listinfo/computer-go

___
Computer-go mailing list
Computer-go@dvandva.org
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go

Re: [Computer-go] testing improvements

2011-08-04 Thread David Fotland
I often see that one side gets lucky early and over a few hundred games the
win rate moves back toward what I expected.  Small numbers of games (like a
few hundred) can be very misleading.

David

 -Original Message-
 From: computer-go-boun...@dvandva.org [mailto:computer-go-
 boun...@dvandva.org] On Behalf Of Vlad Dumitrescu
 Sent: Thursday, August 04, 2011 2:02 PM
 To: computer-go@dvandva.org
 Subject: Re: [Computer-go] testing improvements
 
 On Thu, Aug 4, 2011 at 22:41, David Fotland fotl...@smart-games.com
wrote:
  Remember that the confidence interval is two sided, so 3% means plus or
  minus 3%.  So 52% win rate is within +- 3% of 50%.
 
 Yes, of course. What I reacted to was that under the whole test, one
 bot always had around 52% wins (well, after some 100 games, at least).
 I would have thought it would move around the real value.
 
 Thanks,
 Vlad
 
  -Original Message-
  From: computer-go-boun...@dvandva.org [mailto:computer-go-
  boun...@dvandva.org] On Behalf Of Vlad Dumitrescu
  Sent: Thursday, August 04, 2011 1:14 PM
  To: computer-go@dvandva.org
  Subject: Re: [Computer-go] testing improvements
 
  Hi,
 
  On Thu, Aug 4, 2011 at 19:29, David Fotland fotl...@smart-games.com
  wrote:
   Did each fuego play the same number of games vs gnugo, and did each
 play
   half its games on each color?
 
  Yes, I set up an all-play-all competition with gomill.
 
  On Thu, Aug 4, 2011 at 19:55, Erik van der Werf
  erikvanderw...@gmail.com wrote:
   On Thu, Aug 4, 2011 at 6:57 PM, Vlad Dumitrescu vladd...@gmail.com
  wrote:
    The scores towards gnugo are almost
   identical, but the two fuegos score 449-415, which is 52% and the
95%
   confidence is ~3%, i.e. ~10 ELO.
  
   That 3% is not a 95% confidence interval, more like 1 standard
   deviation... (so nothing with high confidence yet)
 
  I took the easy way out and used a formula mentioned by David Fotland
  on this list for a while ago
 
  There is a simple formula to estimate the confidence interval of a
  result.
  I use it to see if a new version is likely better than a reference
  version
  (but I use 95% confidence intervals, so over hundred of experiments it
  gives
  me the wrong answer too often).
  1.96 * sqrt(wr * (1 - wr) / trials)
  Where wr is the win rate of one version vs the reference, and trials
is
  the
  number of test games.
 
  On Thu, Aug 4, 2011 at 20:21, Kahn Jonas jonas.k...@math.u-psud.fr
  wrote:
   All the more since you're testing the same idea on two bots
   simultaneaously. So if you want to be wrong at most five percent of
the
   time, and consider you are better as soon as one of the bots gets
   better, you have to make individual tests at the 2.5% level.
 
  At the moment I ran the bots without any modification, to see if
  everything works fine. So I think that the results between the
  identical bots should have been closer to 50% or at least to swing
  sometimes to the other side of 50%. Right now it's 625-566, which is
  52,5% and  2.83% confidence according to the formula above.
 
  The results are
  fuego-1.1 v fuego-new (1199/2000 games)
  unknown results: 1 0.08%
  board size: 9   komi: 6.5
              wins              black          white        avg cpu
  fuego-1.1    569 47.46%       386 64.33%     183 30.55%      2.69
  fuego-new    629 52.46%       415 69.28%     214 35.67%      2.67
                                801 66.81%     397 33.11%
 
  I realize that statistic results don't always match what one would
  expect, but this should be a straightforward case...
 
  Thanks a lot for all the answers!
 
  regards,
  /Vlad
  ___
  Computer-go mailing list
  Computer-go@dvandva.org
  http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
 
  ___
  Computer-go mailing list
  Computer-go@dvandva.org
  http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
 
 ___
 Computer-go mailing list
 Computer-go@dvandva.org
 http://dvandva.org/cgi-bin/mailman/listinfo/computer-go

___
Computer-go mailing list
Computer-go@dvandva.org
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go


Re: [Computer-go] testing improvements

2011-08-04 Thread Vlad Dumitrescu
On Thu, Aug 4, 2011 at 23:12, David Fotland fotl...@smart-games.com wrote:
 I often see that one side gets lucky early and over a few hundred games the
 win rate moves back toward what I expected.  Small numbers of games (like a
 few hundred) can be very misleading.

Great, thanks. This means then that 1000 games can only detect a
change of at least 50 ELO.
/Vlad
___
Computer-go mailing list
Computer-go@dvandva.org
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go


Re: [Computer-go] testing improvements

2011-08-04 Thread Don Dailey
On Thu, Aug 4, 2011 at 5:37 PM, Vlad Dumitrescu vladd...@gmail.com wrote:

 On Thu, Aug 4, 2011 at 23:12, David Fotland fotl...@smart-games.com
 wrote:
  I often see that one side gets lucky early and over a few hundred games
 the
  win rate moves back toward what I expected.  Small numbers of games (like
 a
  few hundred) can be very misleading.

 Great, thanks. This means then that 1000 games can only detect a
 change of at least 50 ELO.


Of course it's a matter of how much certainty you want.  If your results
show only 2 or 3 ELO,  you need tens of thousands of games to give high
confidence that there is an actual improvement.However if your results
show 50 ELO,  I believe the error margins after 1000 games is more than
enough to prove that some of this 50 ELO is real.Of course it may all
be real,  but you can only assume that some of it is.

Don






 /Vlad
 ___
 Computer-go mailing list
 Computer-go@dvandva.org
 http://dvandva.org/cgi-bin/mailman/listinfo/computer-go

___
Computer-go mailing list
Computer-go@dvandva.org
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go

Re: [Computer-go] pachi questions

2011-08-04 Thread Michael Williams
Why does pachi fill it's eye here, when it can simply pass and win the game
(komi = -35.5) ?


IN: genmove b
pre-simulated 816 games
(UCT tree; root white; extra komi 0.00; max depth 10)
[F6] 1.000/1002 [prior 0.059/119 amaf 1.000/824 crit 0.000] h=0 c#=3 fbfb7
 [pass] 0.995/1004 [prior 0.500/14 amaf 0.000/0 crit -0.005] h=0 c#=5
fbfbd
(avg score 0.00/0 value 0.00/0)
[186] best 0.00 | seq | can pass(0.995)  A1(0.000)
F1(0.000)
*** WINNER is A1 (1,1) with score 0. (0/1002:186/186 games), extra komi
0.00
genmove in 0.10s (1820 games/s, 606 games/s/thread)
playing move A1
Move:  41  Komi: -35.5  Handicap: 0  Captures B: 2 W: 8
  A B C D E FA B C D E F
+-++-+
  6 | O O O O . O |  6 | O O O O O O |
  5 | O O . O O . |  5 | O O O O O O |
  4 | O O O . O O |  4 | O O O O O O |
  3 | X X O O O O |  3 | X X O O O O |
  2 | X X X X X X |  2 | X X X X X X |
  1 | X)X X X X . |  1 | X X X X X X |
+-++-+
___
Computer-go mailing list
Computer-go@dvandva.org
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go