I have traced many, many bad moves to RAVE failures. This post
is a "brain dump" on what I have learned.

SYMPTOMS
--------
I classify two failure modes:

    1) RAVE searches a bad move because it is good later.
    2) RAVE won't search the best move because it is bad later.

The first failure mode causes inefficiency. That is, if RAVE
Likes a move then it will be searched and accumulate actual UCT 
trials, and then it must be refuted before the best move can
be found.

If inefficiency only cost an occasional factor of 2, then I
would not worry about it. Unfortunately, RAVE will probably
like a move for several ply, which means that inefficiency 
applies recursively. If you can play several forcing moves to 
delay consequences, then inefficiency can be fatal.

The second failure mode is more problematic, because it is not 
clear how many moves have to be refuted in order for RAVE to 
give the best move a trial. Moreover, every trial for every 
alternative move creates another probably bad RAVE trial for
the best move.


EXPLANATION IN CAUSALITY TERMS
-----------------------------------
RAVE finds moves that correlate with winning. RAVE wants moves 
that *cause* winning.


EXPLANATION IN DOMAIN TERMS
--------------------------------
Failure mode 1 above occurs because some moves are only playable 
when we win. For example, we succeed in breaking into the 
opponent's territory and therefore a move in that region 
becomes good.

Failure mode 2 above occurs when a move that is vital to winning 
now stops being able to win in the future. For example, suppose 
that you have a critical semeai. If you don't play a winning 
move now, then all of those moves become losing moves in the 
future.


PEBBLES vs FUEGO
----------------
I suspect that RAVE problems are more severe in Pebbles than in 
Fuego. There are three differences:

1) Fuego has a larger exploration coefficient. (0.7 vs 0.25)
2) Fuego combines UCT and RAVE and then adds the UCT exploration 
term, whereas Pebbles applies separate RAVE and UCT exploration 
terms.
3) Fuego's RAVE bonus decreases with distance.

In the first two cases, I have measured that Pebbles choices are 
optimal for Pebbles, despite the problems that ensue. I should 
test the third difference, but have not yet.


WHAT I HAVE DONE ABOUT IT
-------------------------
I have made some headway using special cases.

For example, there is a special case involving approach moves: 
if a move A is self-atari, and the *other* liberty B of that 
string is not self-atari, then the RAVE score of B will be the 
higher of the RAVE score of A and the RAVE score of B. (Magnus's 
idea, BTW.)

There are also policies in the playouts that reduce RAVE 
failures. Fuego has "clump correction", for example. RAVE 
receives better information when the playouts identify better 
local moves.

I have also tested alternatives in the exploration policy. For 
example, a exploring positions that seem to be losing more 
broadly than positions that seem to be winning. And the Orego 
team's suggestion to use the beta distribution to measure 
variance.

These refinements can have large benefits, but they do not 
completely solve the problem.


WHAT SHOULD BE DONE
-------------------
IMO, we have to redesign RAVE. I have been slow to do so because 
RAVE is so powerful. Really, our programs would not play well at 
all without RAVE. Still, as it is presently conceived, RAVE 
places a scalability limit on the program. It will have to be 
addressed eventually.

Hillis has proposed that moves have "context codes" that we can 
compare against the current context, and only score RAVE if the 
codes "match". My instinct is that this is the right direction.

Dave Fotland said that testing 3x3 neighborhoods yielded no 
benefit. My inference is that contexts should not limit RAVE 
collection too much.

I think that Hillis and Dailey reported tests that allowed RAVE 
matching against the opponent's moves, with results that were 
possibly beneficial.

Most likely the contexts should be graduated, so that we match 
many moves early in a node's lifecycle, and require a 
restrictive match as the node ages.

I will test a solution like this in Pebbles after I work out the 
math involving how to fit parameters to optimize the value of 
contexts.

_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to