I have traced many, many bad moves to RAVE failures. This post is a "brain dump" on what I have learned.
SYMPTOMS -------- I classify two failure modes: 1) RAVE searches a bad move because it is good later. 2) RAVE won't search the best move because it is bad later. The first failure mode causes inefficiency. That is, if RAVE Likes a move then it will be searched and accumulate actual UCT trials, and then it must be refuted before the best move can be found. If inefficiency only cost an occasional factor of 2, then I would not worry about it. Unfortunately, RAVE will probably like a move for several ply, which means that inefficiency applies recursively. If you can play several forcing moves to delay consequences, then inefficiency can be fatal. The second failure mode is more problematic, because it is not clear how many moves have to be refuted in order for RAVE to give the best move a trial. Moreover, every trial for every alternative move creates another probably bad RAVE trial for the best move. EXPLANATION IN CAUSALITY TERMS ----------------------------------- RAVE finds moves that correlate with winning. RAVE wants moves that *cause* winning. EXPLANATION IN DOMAIN TERMS -------------------------------- Failure mode 1 above occurs because some moves are only playable when we win. For example, we succeed in breaking into the opponent's territory and therefore a move in that region becomes good. Failure mode 2 above occurs when a move that is vital to winning now stops being able to win in the future. For example, suppose that you have a critical semeai. If you don't play a winning move now, then all of those moves become losing moves in the future. PEBBLES vs FUEGO ---------------- I suspect that RAVE problems are more severe in Pebbles than in Fuego. There are three differences: 1) Fuego has a larger exploration coefficient. (0.7 vs 0.25) 2) Fuego combines UCT and RAVE and then adds the UCT exploration term, whereas Pebbles applies separate RAVE and UCT exploration terms. 3) Fuego's RAVE bonus decreases with distance. In the first two cases, I have measured that Pebbles choices are optimal for Pebbles, despite the problems that ensue. I should test the third difference, but have not yet. WHAT I HAVE DONE ABOUT IT ------------------------- I have made some headway using special cases. For example, there is a special case involving approach moves: if a move A is self-atari, and the *other* liberty B of that string is not self-atari, then the RAVE score of B will be the higher of the RAVE score of A and the RAVE score of B. (Magnus's idea, BTW.) There are also policies in the playouts that reduce RAVE failures. Fuego has "clump correction", for example. RAVE receives better information when the playouts identify better local moves. I have also tested alternatives in the exploration policy. For example, a exploring positions that seem to be losing more broadly than positions that seem to be winning. And the Orego team's suggestion to use the beta distribution to measure variance. These refinements can have large benefits, but they do not completely solve the problem. WHAT SHOULD BE DONE ------------------- IMO, we have to redesign RAVE. I have been slow to do so because RAVE is so powerful. Really, our programs would not play well at all without RAVE. Still, as it is presently conceived, RAVE places a scalability limit on the program. It will have to be addressed eventually. Hillis has proposed that moves have "context codes" that we can compare against the current context, and only score RAVE if the codes "match". My instinct is that this is the right direction. Dave Fotland said that testing 3x3 neighborhoods yielded no benefit. My inference is that contexts should not limit RAVE collection too much. I think that Hillis and Dailey reported tests that allowed RAVE matching against the opponent's moves, with results that were possibly beneficial. Most likely the contexts should be graduated, so that we match many moves early in a node's lifecycle, and require a restrictive match as the node ages. I will test a solution like this in Pebbles after I work out the math involving how to fit parameters to optimize the value of contexts. _______________________________________________ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/