I test against a reference (gnugo), and on cgos.  Testing against the old
version of the same bot can be misleading.

 

Against gnugo, I run it on the top level, 19x19, and use 2000 playouts
rather than a time limit, for repeatability.  The number of playouts is
picked to give a win rate close to 50%, to get better statistics.  When I
started testing, with a much weaker bot, I tested on 9x9 with 3 minutes time
limits.

 

To test for statistical significance in the results I use an approximate
formula for two standard deviation bounds:
=1.96*SQRT(win-rate*(1-win-rate)/games-played).

 

Typically I run 2000 to 4000 game matches, to get the bound below +- 2%

 

CGOS is good to find bugs by playing a variety of opponents, but it takes a
long time to get a significant number of games.  Against gnugo I make
changes and do tests once or twice a day, but it takes a week or more to get
good results on CGOS.

 

My experience now is that for every improvement to the program, I try at
least 10 things that make it weaker.

 

David

 

 

From: [email protected]
[mailto:[email protected]] On Behalf Of Steve Safarik
Sent: Saturday, February 19, 2011 11:14 AM
To: [email protected]
Subject: [Computer-go] Assessing Improvements

 

Suppose I develop what I think is an improved feature, for example a better
influence function or some other.  I'd like to hear people's thoughts on how
to best & most quickly determine if it is in fact an improvement.  Do I just
take my new function and replace the equivalent function in something like
Fuego, then have the two engines start playing games?  My impression is that
would be a rather slow way to get enough games to be of significance.  Is
there a better way to compare two engines?  If that is indeed the method
people generally use, how much time do you allow per move or game, and can
you tell me your general experiences with doing this?  Thanks.

 

Steve.

 

 

_______________________________________________
Computer-go mailing list
[email protected]
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go

Reply via email to