Re: [Computer-go] Understanding and implementing RAVE

Gonçalo Mendes Ferreira Mon, 26 Oct 2015 05:20:49 -0700

I'm not sure about "if color of that particular (child) node was thefirst to play on that intersection in the playout". What I think theauthors meant was to only increment the AMAF statistic once because ofko repetitions. So, repeatable at different states, regardless of color.Now I'm not so sure.

Other than that you're missing the UCB part of the formula becausePachi/Michi doesn't use one. I also prefer storing in the node thevisits/wins/amaf_visits/amaf_wins of every (legal) transition, and youseem to store that relative to the state. That should save space but howdoes it fare in traversing the legal plays to select the transition in UCT?


Gonçalo F.

On 26/10/2015 10:36, Urban Hafner wrote:

With the help of Michi <https://github.com/pasky/michi> (thank you Petr!)
I’m currently working on adding RAVE to my UCT tree search. Before I get
too deep into it I’d like to make sure I actually understand it correctly.
It would be great if you could have a quick look at my pseudo code (mostly
stolen from michi).

Give a Node with the fields

* color
* visits,
* wins
* amaf_visits
* amaf_wins

The tree is updated after a playout in the following way:

We traverse the tree according to the moves played. visits gets incremented
unconditionally, and wins gets incremented if the playout was a win for
color. That is the same as UCT.

Then we have a look at the children of the node and increment amaf_visits
for the children if color of that particular (child) node was the first to
play on that intersection in the playout. If the playout was also a win for
the (child) node then we also increment amaf_wins.


Then we also need to change the formula to select then next node. I must
admit I just copied the one from Michi (RAVE_EQUIV = 3500. Stolen from
Michi):

win_rate = wins / plays (assumes plays will never be 0)
if amaf_plays == 0 {
   return win_rate
} else {
   rave_winrate = amaf_wins / amaf_plays
   beta = amaf_plays / ( amaf_plays + plays + plays * amaf_plays /
RAVE_EQUIV)
   return beta * rave_winrate + (1 - beta) * winrate
}

Obviously I’m not expecting anyone to actually check the formula, but it
would be great if I could get a thumbs up (or down) on the general idea.

Cheers,

Urban


_______________________________________________
Computer-go mailing list
[email protected]
http://computer-go.org/mailman/listinfo/computer-go

_______________________________________________
Computer-go mailing list
[email protected]
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Understanding and implementing RAVE

Reply via email to