Hi Isaac, in a nutshell RAVE is basically AMAF adapted for Monte Carlo Tree Search. The original paper describing it is http://www.machinelearning.org/proceedings/icml2007/papers/387.pdf and a paper for "broader audience" can be found here: http://www.lri.fr/~gelly/paper/MoGoNectar.pdf (the picture you posted comes from that paper).
I think your description of RAVE is not completely correct, or at least I don't understand it completely :). If I understand that sentence "only the siblings of the last node in the tree are updated accordingly", then it is wrong. There are two important parts in the algorithms: the backup and the use of the RAVE value. The second is the hardest to tune and to make it right. The proposed way of using the values in the original paper is not optimal (while already very useful). A much better way (especially in 19x19) has been described on that list by David Silver. For the backup (as it is your original question), for each node traversed by the simulation, back up the values exactly as it would be done in AMAF *if* the playout began at that node. Note that I call playout the whole simulation starting from the root and going to the end of the game. Things you have to take care about, for each node, including the root, including the last node in the tree (most of them obvious, but believe it is really easy to forget small details and get something totally useless): - Count only moves that happen after the node. - Count only a move if it is played first on the intersection. Be careful with captures, especially if they occur within the tree. It is really easy to mess up the statistics. - Count only a move if it is the same color of the node (if in the position of the node, black is to play, count only black moves for that node) - Count all moves that happen after the node, including those happening within the tree, and including the move that happen just after the node. I hope it will help you write a correct implementation. Don't hesitate to ask for precisions. Sylvain 2009/1/9 Isaac Deutsch <i...@gmx.ch> > I'm a bit confused about the difference between AMAF and RAVE (if there's > one). > As far as I understand, with AMAF, you sample in each playout (after it > leaves > the tree) the moves played (by both players), but only the first move at > any > position, then you reward all moves played either by a win or loss, > depending on > their colors. > > I tried comparing this to RAVE, as described in many papers. There might be > multiple definitions of RAVE, but the one that seems the most clear to me > is > this one (picture used because of math stuff): > > http://img352.imageshack.us/img352/443/bild1pb3.png > > Is it correct that with RAVE, after a playout (after the tree), only the > siblings of the last node in the tree are updated accordingly? The main > difference to AMAF would be that instead of a single array with values of > AMAFsumReward and AMAFnumPlayed, each node keeps for each of its children > separate variables that hold these values. When a playout is finished, only > the > values of these children are updated instead of the single array. > > I hope you're able to make any sense out of this, sometimes a text can be > confusing when the writer is confused. ;p > > Cheers, ibd > -- > Sensationsangebot verlängert: GMX FreeDSL - Telefonanschluss + DSL > für nur 16,37 Euro/mtl.!* http://dsl.gmx.de/?ac=OM.AD.PD003K1308T4569a > _______________________________________________ > computer-go mailing list > computer-go@computer-go.org > http://www.computer-go.org/mailman/listinfo/computer-go/ >
_______________________________________________ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/