Re: [computer-go] More UCT / Monte-Carlo questions

Jason House Tue, 05 Feb 2008 13:22:29 -0800

On Feb 5, 2008 2:22 PM, David Fotland <[EMAIL PROTECTED]> wrote:

>  Helps a little, but I'm still unclear on how a new node is handled.
>
>
>
> Do you have lightweight nodes (that only contain info on one positions,
> with pointers to children and siblings)?  Then when uct finds a node with no
> children, what do you do?  Mogo and Mark create a new node for each new
> simulation.  So do you create a single random child at this point and
> continue, or do you create all children, then pick one, and continue?
>
Leaves in my tree are identical to interior nodes except that they have an
empty child list.  When I reach a sufficient number of simulations
(settable, currently 50?), I expand the node and create new leaf nodes for
every valid move.  I then pick one child and continue.  When the sim for
that child finishes, I do a RAVE update of many children.  The RAVE list
gets a move prepended, and is passed to the parent for further RAVE
arithmetic.  Updating children with RAVE info is _NOT_ recursive.  It's a
shallow update.  This was done to match what others posted as their method
of implementing RAVE.




> The descriptions of RAVE are also pretty unclear.  I guess each node has a
> rave counter that counts how many times this move was seen during a
> playout.  When you backup data in the uct tree, do you visit every child and
> update the rave counters of the children?
>

There should be a thread in the mailing list where I was asking the same
questions...  I hope I answered the backup question above.  Every node in my
tree (leaf or otherwise) has the following fields:

int sims, wins;
int raveSims, raveWins;

I don't use a +1,-1 scheme by default (although my MC alpha-beta does).  Win
rate and UCB win rate is computed independently according to the formulas
given for mogo.  I also then combine the values in the same way as described
in the mogo paper.  I still use the UCB1-TUNED formula.

I too had a tough time deciphering the paper that discussed RAVE.  After
struggling with the paper, I posted my translation of the paper to the
mailing list.  It got zero responses :(



> It think many programs run several simulations through a node before
> allocating the children.  I can see how this saves memory, but then how do
> you save the RAVE information from the early simulations?
>
>
> For RAVE, after doing a sim at a leaf, I walk back up the UCT tree.  At
> each internal node, I keep update RAVE game counters applicable children.  I
> never go deeper into the tree to apply RAVE values.  Does that help?
>

_______________________________________________
computer-go mailing list
[email protected]
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] More UCT / Monte-Carlo questions

Reply via email to