Hi Francois, Welcome
> For reference I need about 100k playouts with > RAVE to get 50% winrate against GnuGo 3.8 L10. Yes that's more or less expected. At least before the "big" improvements (yet to come ;-) In my case I do a lot of testing at 4x10000 because the games are around 15 seconds long and I get fast Elo confidence intervals. At that rate 40 K plyo/move I get about 40-42% of wins against gnugo. This is more or less consistent with a debugged barebones without particular smarts (but with RAVE, without progressive widening). I guess 40% at 40000 scales to 50% near 100K but the exact point where I reach 50% has not been studied as I expect it to be much lower in the near future. (Optimistic) > The next step is obviously to apply these to the > playouts. I am currently testing my program with > the ELO features in the playouts, but unfortunately > the preliminary results don't look good. That's exactly my experience! Although you do get improvement with extend from atari/capture/distance to prev heuristic. The Mogo and CrazyStone papers report improvement "all features included" which is true because the other ideas produce improvement, but they don't give results for the patterns in isolation. I got a lot of improvement from Rémi's Bradley-Terry ideas in move prediction (although with some overlearning which I didn't care much about as predicting moves is not my interest.) But neither the naif values (times played/times seen) nor the improved Bradley-Terry values are better in playouts than uniform random. They are 158 CI(114..202) Elo points worse! That is good and bad news. Why should uniform random be the best?. Obviously it is not. But what humans play lacks all the information about what they don't play because it is obvious to them, but it is not obvious to a "silly" playout policy. How to find good values for the patterns? (What I have tried.) a. Use small patterns (3x3) with all non-ill-formed patterns in the database. (Other databases have a value for "unknown" this one shouldn't.) b. Classify patterns. I have done that in 40 classes. This way you reduce the amount of degrees of liberty. So your vector of gamma values is in R^40 c. Then what? I really don't know. I have a "sort of genetic algorithm". I like the idea that anything changes at random, because the gammas are not independent and this way the expected value of the correlation is zero even under stochastic dependence. Then I select the "best winners" and move my center of gravity one little step in one or two classes of patterns repeat the entire process. Then test to see if there was improvement. A long process. I only won a little in the first iterations. After tat fake improvement that wasn't verified against uniform random. In all about 100 Elo points, less improvement than the "humans patterns" do wrong. I guess best playouts are a research area where there is room for improvement. Jacques. _______________________________________________ Computer-go mailing list [email protected] http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
