Hi Jacques > I got a lot of improvement from Rémi's Bradley-Terry > ideas in move prediction (although with some > overlearning which I didn't care much about as > predicting moves is not my interest.) But neither > the naif values (times played/times seen) nor the > improved Bradley-Terry values are better in playouts > than uniform random. They are 158 CI(114..202) Elo > points worse!
So it seems this is not just me. I kinda expecting a an almost-guaranteed increase in strength. > a. Use small patterns (3x3) with all non-ill-formed > patterns in the database. (Other databases have a value > for "unknown" this one shouldn't.) You mean all the 3x3 patterns? I'm only using 3x3 patterns that occur a number of times in my training collection. > b. Classify patterns. I have done that in 40 classes. > This way you reduce the amount of degrees of liberty. > So your vector of gamma values is in R^40 I'm not following you here. What sort of classes? Degrees of liberty? R^40? Would you mind explaining a bit for me? Your genetic algo looks interesting, but I have a feeling I get get a better return for my time working on other things at the moment. -- Francois van Niekerk Email: [email protected] | Twitter: @francoisvn Cell: +2784 0350 214 | Website: http://leafcloud.com On Thu, Dec 30, 2010 at 11:57 PM, Jacques Basaldúa <[email protected]> wrote: > Hi Francois, Welcome > > >> For reference I need about 100k playouts with >> RAVE to get 50% winrate against GnuGo 3.8 L10. > > Yes that's more or less expected. At least before the > "big" improvements (yet to come ;-) > > In my case I do a lot of testing at 4x10000 because > the games are around 15 seconds long and I get fast > Elo confidence intervals. At that rate 40 K plyo/move > I get about 40-42% of wins against gnugo. This is more > or less consistent with a debugged barebones without > particular smarts (but with RAVE, without progressive > widening). I guess 40% at 40000 scales to 50% near > 100K but the exact point where I reach 50% has not > been studied as I expect it to be much lower in the > near future. (Optimistic) > >> The next step is obviously to apply these to the >> playouts. I am currently testing my program with >> the ELO features in the playouts, but unfortunately >> the preliminary results don't look good. > > That's exactly my experience! Although you do get > improvement with extend from atari/capture/distance to > prev heuristic. > > The Mogo and CrazyStone papers report improvement > "all features included" which is true because the > other ideas produce improvement, but they don't > give results for the patterns in isolation. > > I got a lot of improvement from Rémi's Bradley-Terry > ideas in move prediction (although with some > overlearning which I didn't care much about as > predicting moves is not my interest.) But neither > the naif values (times played/times seen) nor the > improved Bradley-Terry values are better in playouts > than uniform random. They are 158 CI(114..202) Elo > points worse! > > That is good and bad news. Why should uniform random > be the best?. Obviously it is not. But what humans > play lacks all the information about what they don't > play because it is obvious to them, but it is not > obvious to a "silly" playout policy. > > How to find good values for the patterns? (What I have > tried.) > > a. Use small patterns (3x3) with all non-ill-formed > patterns in the database. (Other databases have a value > for "unknown" this one shouldn't.) > > b. Classify patterns. I have done that in 40 classes. > This way you reduce the amount of degrees of liberty. > So your vector of gamma values is in R^40 > > c. Then what? I really don't know. I have a "sort of > genetic algorithm". I like the idea that anything > changes at random, because the gammas are not > independent and this way the expected value of the > correlation is zero even under stochastic dependence. > Then I select the "best winners" and move my center of > gravity one little step in one or two classes of patterns > repeat the entire process. Then test to see if there > was improvement. A long process. I only won a little > in the first iterations. After tat fake improvement > that wasn't verified against uniform random. > > In all about 100 Elo points, less improvement than > the "humans patterns" do wrong. I guess best playouts > are a research area where there is room for improvement. > > > Jacques. > > > > _______________________________________________ > Computer-go mailing list > [email protected] > http://dvandva.org/cgi-bin/mailman/listinfo/computer-go > _______________________________________________ Computer-go mailing list [email protected] http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
