Re: [computer-go] How to "properly" implement RAVE?

Sylvain Gelly Fri, 09 Jan 2009 11:34:39 -0800

Hi Isaac,
in a nutshell RAVE is basically AMAF adapted for Monte Carlo Tree Search.
The original paper describing it is
http://www.machinelearning.org/proceedings/icml2007/papers/387.pdf and a
paper for "broader audience" can be found here:
http://www.lri.fr/~gelly/paper/MoGoNectar.pdf (the picture you posted comes
from that paper).


I think your description of RAVE is not completely correct, or at least I
don't understand it completely :). If I understand that sentence "only the
siblings of the last node in the tree are updated accordingly", then it is
wrong.

There are two important parts in the algorithms: the backup and the use of
the RAVE value. The second is the hardest to tune and to make it right. The
proposed way of using the values in the original paper is not optimal (while
already very useful). A much better way (especially in 19x19) has been
described on that list by David Silver.
For the backup (as it is your original question), for each node traversed by
the simulation, back up the values exactly as it would be done in AMAF *if*
the playout began at that node. Note that I call playout the whole
simulation starting from the root and going to the end of the game.

Things you have to take care about, for each node, including the root,
including the last node in the tree (most of them obvious, but believe it is
really easy to forget small details and get something totally useless):
  - Count only moves that happen after the node.
  - Count only a move if it is played first on the intersection.
Be careful with captures, especially if they occur within the tree. It is
really easy to mess up the statistics.
  - Count only a move if it is the same color of the node (if in the
position of the node, black is to play, count only black moves for that
node)
  - Count all moves that happen after the node, including those happening
within the tree, and including the move that happen just after the node.

I hope it will help you write a correct implementation. Don't hesitate to
ask for precisions.

Sylvain

2009/1/9 Isaac Deutsch <i...@gmx.ch>

> I'm a bit confused about the difference between AMAF and RAVE (if there's
> one).
> As far as I understand, with AMAF, you sample in each playout (after it
> leaves
> the tree) the moves played (by both players), but only the first move at
> any
> position, then you reward all moves played either by a win or loss,
> depending on
> their colors.
>
> I tried comparing this to RAVE, as described in many papers. There might be
> multiple definitions of RAVE, but the one that seems the most clear to me
> is
> this one (picture used because of math stuff):
>
> http://img352.imageshack.us/img352/443/bild1pb3.png
>
> Is it correct that with RAVE, after a playout (after the tree), only the
> siblings of the last node in the tree are updated accordingly? The main
> difference to AMAF would be that instead of a single array with values of
> AMAFsumReward and AMAFnumPlayed, each node keeps for each of its children
> separate variables that hold these values. When a playout is finished, only
> the
> values of these children are updated instead of the single array.
>
> I hope you're able to make any sense out of this, sometimes a text can be
> confusing when the writer is confused. ;p
>
> Cheers, ibd
> --
> Sensationsangebot verlängert: GMX FreeDSL - Telefonanschluss + DSL
> für nur 16,37 Euro/mtl.!* http://dsl.gmx.de/?ac=OM.AD.PD003K1308T4569a
> _______________________________________________
> computer-go mailing list
> computer-go@computer-go.org
> http://www.computer-go.org/mailman/listinfo/computer-go/
>

_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] How to "properly" implement RAVE?

Reply via email to