I attempted that many years ago. It was ~2004, IIRC, before UCT/RAVE. I started from the simulated annealing model from Bruegmann's paper. Using data from that search, I tried to create decision trees along the lines you said.
The problem was that performance was terrible. The cost of building the trees was maybe 100x the cost of simulation. I could see how I might reduce that to 10x, but overall it didn't seem promising. Then I heard about UCT and MoGo and I figured that I was on the wrong path. Maybe I just had the wrong implementation, or made poor decisions about the feature set used by the decision tree. I have since done studies where I collect data from Pebbles searches and build decision trees that predict winning percentage. The trees seemed to capture some skills, but it was unclear how to use that in the search process. Note that Pebbles is a much stronger player than my simulated annealing player was, so its conclusions have less variance. And from that you get better decision tree induction. I still think something should work in this area. I will try again after Pebbles catches up to state of the art. (That will take a while; I have quite a backlog.) Brian -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Peter Drake Sent: Thursday, June 30, 2011 12:30 PM To: [email protected] Subject: [Computer-go] Decision trees: making the most of each playout The general Monte Carlo approach is: Repeat until golden brown: Perform a playout, guided by the current policy Determine the winner Adjust the policy The policy is adjusted so that winning moves are played more often, losing moves less often (with some exploration thrown in). To make the most of each playout, the policy should generalize, so that a move that has done well in one situation should be considered good in "similar" situations. As discussed at length in our Power of Forgetting paper, much hinges on the definition of similar. At one extreme, all situations are considered similar; the earliest MC Go work did this. At the other extreme, every situation (complete game history) is considered unique; this is pure MCTS. In between lie more successful approaches like transposition tables, RAVE, and the Last Good Reply policy. We would also like the policy to converge on an optimal policy given sufficient time (which might be possible at the end of a close game) and for it to be possible for the policy to be pre-initialized with domain-specific knowledge (ideally learned automatically from self- play or recorded games). Here's our latest thought: Begin at the "all situations are the same" extreme: gather a win rate for each point on the board, regardless of when it is played. This will generate data very quickly, because each playout generates (roughly) one datum for each point. As these piles of data expand, split them by context. For example, maybe (from some particular real board position) A1 is a bad move, but is an excellent move if B1 was the immediately preceding move. We would therefore split the data for A1 into two piles: "B1 was the previous move" and "B1 was not the previous move". This would continue, growing a tree in the fashion of traditional decision tree induction algorithms (ID3, C4.5, CART, etc.). Advantages of this (as yet untried) approach: 1) Underexplored moves get many data on which to estimate their value. These data, being drawn from deep in playouts, will fluctuate considerably, providing useful exploration noise in the same way that RAVE does. 2) Given a sufficiently rich set of features on which to split, this would converge on a perfect policy. The entire history of the game is certainly a sufficiently rich set of features, but we could also add patterns, atari, etc. Disadvantages: 1) It's not immediately clear how to pre-initialize the tree to take advantage of domain knowledge. Ideally this could be done through self- play or examining recorded games. Comments? Peter Drake http://www.lclark.edu/~drake/ _______________________________________________ Computer-go mailing list [email protected] http://dvandva.org/cgi-bin/mailman/listinfo/computer-go _______________________________________________ Computer-go mailing list [email protected] http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
