Jason House wrote:
Darren Cook wrote:
Hi Jason,
In UCT the monte carlo searches (I find it clearer to call them "the
playouts") are always run to the end of the game. So they always
accurately (well, as accurate as a random playout can be!) take sente in
account. Therefore my understanding is that it does not matter whether
they start at an even or odd ply in the UCT part of the tree.

Darren

Let's take a basic example of a leaf node in an MC search tree that hasn't been expanded, but has 4 children. Let's say that random simulation through the children have winning percentages of {46%, 51%, 47%, 48%}. Assuming a uniform simulation policy, the winning percentage would be the average of the four, or 48%... but when that node gets expanded, it'll start discovering the winning percentages of the children nodes. Now when we look at the node that used to be our leaf node and ask what it's winning percentage would be, we come to a different conclusion... The winning percentage is either 46% or 51% depending on if the color to move.

In this example, the difference between the 0-ply and the 1-play is 3% (51%-48%) or 2% (48%-46%) depending on what the color to move was. Does that help?

If I understand what you mean, backing-up the average of playouts from a position has a tendency to under-estimate that position from the point of view of the player to move, because if the playout would start by the best move instead of a random move, they would produce a better value. You fear that it may be a problem when evaluating positions at different depth parities. But it is not.

The consequence of this bias in UCT, is that when a move has been searched for only a few simulations, it tends to be over-evaluated. This is not a big problem, because if it is over-evaluated, then it means that it will be searched, and the over-evaluation will be corrected eventually.

Over-estimating a move is not a big problem in UCT, but under-estimating it would have very bad consequence. So, in practice it is better to have optimistic move evaluations, especially when they have not been searched a lot. I had not understood this when I wrote my Turin paper, where I tried to get unbiased estimates of position values. UCT works better because it gives optimistic move values.

If you think hard about it, you'll find that the bias in UCT evaluation helps the search, even though leaves at are different depths. Doing alpha-beta with Monte-Carlo evaluations at the leaves at different depths would fail badly (I vaguely remember Bruno Bouzy may have reported this phenomenon, already). But alpha-beta's back-up is min-max. With the smooth average backup of UCT, everything works very well.

Rémi
_______________________________________________
computer-go mailing list
[email protected]
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to