Although I've never done a RAVE implementation (soon, very soon), this is
related in the sense that AMAF is related to RAVE:
I have recently (yesterday, actually) done some experiments on AMAF with 5-vertex patterns (normal AMAF can be considered 1-vertex patterns). I was not able to
observe any improvement in any of the experiments -- small/large number of playouts per move, various AMAF enhancements turned on/off. Very frustrating since
it seems like such a reasonable idea. My implementation was as follows: If a playout move did NOT match the vertex's pattern at the root, normal AMAF credit
was given. If a playout move DID match the vertex's pattern at root, then the credit was multiplied. So a pattern credit multiplier of 1.0 is equivalent to
straight AMAF. I tried values in 1.1 to 2.0. As I said, I couldn't find any improvement. I should try 9-vertex patterns too, but I was discouraged by the
5-vertex results.
Brian Sheppard wrote:
I have traced many, many bad moves to RAVE failures. This post
is a "brain dump" on what I have learned.
SYMPTOMS
--------
I classify two failure modes:
1) RAVE searches a bad move because it is good later.
2) RAVE won't search the best move because it is bad later.
The first failure mode causes inefficiency. That is, if RAVE
Likes a move then it will be searched and accumulate actual UCT
trials, and then it must be refuted before the best move can
be found.
If inefficiency only cost an occasional factor of 2, then I
would not worry about it. Unfortunately, RAVE will probably
like a move for several ply, which means that inefficiency
applies recursively. If you can play several forcing moves to
delay consequences, then inefficiency can be fatal.
The second failure mode is more problematic, because it is not
clear how many moves have to be refuted in order for RAVE to
give the best move a trial. Moreover, every trial for every
alternative move creates another probably bad RAVE trial for
the best move.
EXPLANATION IN CAUSALITY TERMS
-----------------------------------
RAVE finds moves that correlate with winning. RAVE wants moves
that *cause* winning.
EXPLANATION IN DOMAIN TERMS
--------------------------------
Failure mode 1 above occurs because some moves are only playable
when we win. For example, we succeed in breaking into the
opponent's territory and therefore a move in that region
becomes good.
Failure mode 2 above occurs when a move that is vital to winning
now stops being able to win in the future. For example, suppose
that you have a critical semeai. If you don't play a winning
move now, then all of those moves become losing moves in the
future.
PEBBLES vs FUEGO
----------------
I suspect that RAVE problems are more severe in Pebbles than in
Fuego. There are three differences:
1) Fuego has a larger exploration coefficient. (0.7 vs 0.25)
2) Fuego combines UCT and RAVE and then adds the UCT exploration
term, whereas Pebbles applies separate RAVE and UCT exploration
terms.
3) Fuego's RAVE bonus decreases with distance.
In the first two cases, I have measured that Pebbles choices are
optimal for Pebbles, despite the problems that ensue. I should
test the third difference, but have not yet.
WHAT I HAVE DONE ABOUT IT
-------------------------
I have made some headway using special cases.
For example, there is a special case involving approach moves:
if a move A is self-atari, and the *other* liberty B of that
string is not self-atari, then the RAVE score of B will be the
higher of the RAVE score of A and the RAVE score of B. (Magnus's
idea, BTW.)
There are also policies in the playouts that reduce RAVE
failures. Fuego has "clump correction", for example. RAVE
receives better information when the playouts identify better
local moves.
I have also tested alternatives in the exploration policy. For
example, a exploring positions that seem to be losing more
broadly than positions that seem to be winning. And the Orego
team's suggestion to use the beta distribution to measure
variance.
These refinements can have large benefits, but they do not
completely solve the problem.
WHAT SHOULD BE DONE
-------------------
IMO, we have to redesign RAVE. I have been slow to do so because
RAVE is so powerful. Really, our programs would not play well at
all without RAVE. Still, as it is presently conceived, RAVE
places a scalability limit on the program. It will have to be
addressed eventually.
Hillis has proposed that moves have "context codes" that we can
compare against the current context, and only score RAVE if the
codes "match". My instinct is that this is the right direction.
Dave Fotland said that testing 3x3 neighborhoods yielded no
benefit. My inference is that contexts should not limit RAVE
collection too much.
I think that Hillis and Dailey reported tests that allowed RAVE
matching against the opponent's moves, with results that were
possibly beneficial.
Most likely the contexts should be graduated, so that we match
many moves early in a node's lifecycle, and require a
restrictive match as the node ages.
I will test a solution like this in Pebbles after I work out the
math involving how to fit parameters to optimize the value of
contexts.
_______________________________________________
computer-go mailing list
[email protected]
http://www.computer-go.org/mailman/listinfo/computer-go/
_______________________________________________
computer-go mailing list
[email protected]
http://www.computer-go.org/mailman/listinfo/computer-go/