>I'd like to hear people's thoughts on how to best & most quickly determine if it is in fact an improvement. Before commencing on a statistical study, I recommend very thoroughly testing your change by seeing if it does what you want it to do. Did you really capture the notion that you mean? Did you neglect some low-liberty situation? Or an odd case where two stones are part of the same string? Does your code work under all symmetries? Another key test is how it would behave in real game positions. You should save your program's own games, of course. And you can download tons of games from the Internet. Would your policy exclude (or upgrade/downgrade) the move played? How does it behave on the moves played by winners? Does your policy work equally well in opening/midgame/endgame? Or Black/White? Or your/opponent moves? Or human/computer moves? Something that I would like to do, but have not gotten around to, is to generate cases where the program makes different moves under the new policy. Maybe something like: play games using one policy, then use the other policy as critic, and submit differences of opinion to an oracle. Maybe the oracle could be your own program when thinking for 10x as long, or maybe an independent arbiter like Fuego or Pachi, or have GnuGo play it to the end after both moves, or maybe a bunch of oracles. Statistical study is an obvious technique, but should be used in combination. - Self-play tends to exaggerate differences, because the players are so similar to each other. Programs tend to drift into situations where they disagree, and in this case they will disagree about exactly one thing. (This doesn't mean that you shouldn't do self-play; just balance this against other evidence.) - Testing against any specific opponent will eventually lead to defeating that opponent. Your testing method will bias your decisions about what to work on. - One variant can mishandle some early game situation, leading to losses that would not occur in competitive games. Accordingly, you should do 9x9 testing using a book. E.g., Fuego has a reasonable manually selected book. At a minimum, you should have your program randomly choose between opening with D5 and E5 as black. - Differences are likely to be small. - It is quite likely that your variation would succeed, if it were combined with other changes. E.g., your program handles some cases, but now runs into more difficult situations. - You might get different results at different search efforts. Adding "knowledge" might look good at fast searches, but be meaningless at slow searches. A lot depends on how much search effort is necessary to find this issue on its own. - You should know what your variation costs in execution speed. I wouldn't make too much of this, but you should measure. By the time your program is really strong, it will be so slow that you can add almost anything without making it slower. :-) I like CGOS testing, because it gives me data against diverse opponents at a relatively slow playing speed. Wait, did you say "quickly"? Well, then never mind... Brian
_______________________________________________ Computer-go mailing list [email protected] http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
