I'm contemplating making the change you suggest. The following is my
one concern.

Suppose, to keep the example simple, that there are only two choices
at each ply. My tree is originally

ROOT 0

meaning that there is just one node with no playouts.

In the first playout, my first move is A, so then I have:

ROOT 1
        A 1

Now I try move B, updating the tree to:

ROOT 2
        A 1
        B 1

Fine so far. Now UCT likes A better, so the next playout starts with
A, C, giving me:

ROOT 3
        A 2
                C 1
        B 1

Here's the problem. On the next playout, I'll want to look at the
other alternative to A. In doing so, I will need to compute the UCT
value of trying C again, especially if (as in the Gelly tech report)
I don't automatically choose untried moves over tried moves. When I
look through the children of A and count a total of one playout, it
seems natural that I should update the playout count for A:

ROOT 3
        A 1
                C 1
        B 1

Now I actually add the new move:

ROOT 4
        A 2
                C 1
                D 1
        B 1

On the next playout, I will begin by looking through the children of
ROOT and updating ROOT's run count:

ROOT 3
        A 2
                C 1
                D 1
        B 1

The tree is accurate now, but I've lost a playout! I will, in fact,
lose one playout every time some node gains its second child. Is this
acceptable?

(The number of playouts at the root doesn't really matter, except
that I can't just count the number of runs through the root to see
how many playouts I did. More important is that every time some node
in the subtree rooted at node X gains a second child, X loses a
playout.)


"it seems natural that I should update the playout count for A"  ???

That doesn't seem natural to me.  You've done two playouts from A so
why would you update it to 1?

I've changed my code to do what Eric says instead of simply using the
parent value.  This lets me initialize each new UCT node visit count
and black win count however I wish and still be confident my UCT
equation is valid.  Maybe I'll want to initialize the new node to
something other than 0 playouts and 0 wins if I think this node has a
better-than-average chance of being the correct move.
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to