Re: [computer-go] RAVE problems

Michael Williams Fri, 07 Aug 2009 17:12:33 -0700

Although I've never done a RAVE implementation (soon, very soon), this is 
related in the sense that AMAF is related to RAVE:

I have recently (yesterday, actually) done some experiments on AMAF with 5-vertex patterns (normal AMAF can be considered 1-vertex patterns). I was not able toobserve any improvement in any of the experiments -- small/large number of playouts per move, various AMAF enhancements turned on/off. Very frustrating sinceit seems like such a reasonable idea. My implementation was as follows: If a playout move did NOT match the vertex's pattern at the root, normal AMAF creditwas given. If a playout move DID match the vertex's pattern at root, then the credit was multiplied. So a pattern credit multiplier of 1.0 is equivalent tostraight AMAF. I tried values in 1.1 to 2.0. As I said, I couldn't find any improvement. I should try 9-vertex patterns too, but I was discouraged by the5-vertex results.



Brian Sheppard wrote:

I have traced many, many bad moves to RAVE failures. This post
is a "brain dump" on what I have learned.

SYMPTOMS
--------
I classify two failure modes:

    1) RAVE searches a bad move because it is good later.
    2) RAVE won't search the best move because it is bad later.

The first failure mode causes inefficiency. That is, if RAVE
Likes a move then it will be searched and accumulate actual UCTtrials, and then it must be refuted before the best move can
be found.

If inefficiency only cost an occasional factor of 2, then I
would not worry about it. Unfortunately, RAVE will probably
like a move for several ply, which means that inefficiencyapplies recursively. If you can play several forcing moves todelay consequences, then inefficiency can be fatal.
The second failure mode is more problematic, because it is notclear how many moves have to be refuted in order for RAVE togive the best move a trial. Moreover, every trial for everyalternative move creates another probably bad RAVE trial for
the best move.


EXPLANATION IN CAUSALITY TERMS
-----------------------------------
RAVE finds moves that correlate with winning. RAVE wants movesthat *cause* winning.
EXPLANATION IN DOMAIN TERMS
--------------------------------
Failure mode 1 above occurs because some moves are only playablewhen we win. For example, we succeed in breaking into theopponent's territory and therefore a move in that regionbecomes good.
Failure mode 2 above occurs when a move that is vital to winningnow stops being able to win in the future. For example, supposethat you have a critical semeai. If you don't play a winningmove now, then all of those moves become losing moves in thefuture.
PEBBLES vs FUEGO
----------------
I suspect that RAVE problems are more severe in Pebbles than inFuego. There are three differences:
1) Fuego has a larger exploration coefficient. (0.7 vs 0.25)
2) Fuego combines UCT and RAVE and then adds the UCT explorationterm, whereas Pebbles applies separate RAVE and UCT explorationterms.
3) Fuego's RAVE bonus decreases with distance.
In the first two cases, I have measured that Pebbles choices areoptimal for Pebbles, despite the problems that ensue. I shouldtest the third difference, but have not yet.
WHAT I HAVE DONE ABOUT IT
-------------------------
I have made some headway using special cases.
For example, there is a special case involving approach moves:if a move A is self-atari, and the *other* liberty B of thatstring is not self-atari, then the RAVE score of B will be thehigher of the RAVE score of A and the RAVE score of B. (Magnus'sidea, BTW.)
There are also policies in the playouts that reduce RAVEfailures. Fuego has "clump correction", for example. RAVEreceives better information when the playouts identify betterlocal moves.
I have also tested alternatives in the exploration policy. Forexample, a exploring positions that seem to be losing morebroadly than positions that seem to be winning. And the Oregoteam's suggestion to use the beta distribution to measurevariance.
These refinements can have large benefits, but they do notcompletely solve the problem.
WHAT SHOULD BE DONE
-------------------
IMO, we have to redesign RAVE. I have been slow to do so becauseRAVE is so powerful. Really, our programs would not play well atall without RAVE. Still, as it is presently conceived, RAVEplaces a scalability limit on the program. It will have to beaddressed eventually.
Hillis has proposed that moves have "context codes" that we cancompare against the current context, and only score RAVE if thecodes "match". My instinct is that this is the right direction.
Dave Fotland said that testing 3x3 neighborhoods yielded nobenefit. My inference is that contexts should not limit RAVEcollection too much.
I think that Hillis and Dailey reported tests that allowed RAVEmatching against the opponent's moves, with results that werepossibly beneficial.
Most likely the contexts should be graduated, so that we matchmany moves early in a node's lifecycle, and require arestrictive match as the node ages.
I will test a solution like this in Pebbles after I work out themath involving how to fit parameters to optimize the value ofcontexts.
_______________________________________________
computer-go mailing list
[email protected]
http://www.computer-go.org/mailman/listinfo/computer-go/


_______________________________________________
computer-go mailing list
[email protected]
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] RAVE problems

Reply via email to