Jason House wrote:
Darren Cook wrote:
Hi Jason,
In UCT the monte carlo searches (I find it clearer to call them "the
playouts") are always run to the end of the game. So they always
accurately (well, as accurate as a random playout can be!) take sente in
account. Therefore my understanding is that it does not matter whether
they start at an even or odd ply in the UCT part of the tree.
Darren
Let's take a basic example of a leaf node in an MC search tree that
hasn't been expanded, but has 4 children. Let's say that random
simulation through the children have winning percentages of {46%, 51%,
47%, 48%}. Assuming a uniform simulation policy, the winning percentage
would be the average of the four, or 48%... but when that node gets
expanded, it'll start discovering the winning percentages of the
children nodes. Now when we look at the node that used to be our leaf
node and ask what it's winning percentage would be, we come to a
different conclusion... The winning percentage is either 46% or 51%
depending on if the color to move.
In this example, the difference between the 0-ply and the 1-play is 3%
(51%-48%) or 2% (48%-46%) depending on what the color to move was. Does
that help?
If I understand what you mean, backing-up the average of playouts from a
position has a tendency to under-estimate that position from the point
of view of the player to move, because if the playout would start by the
best move instead of a random move, they would produce a better value.
You fear that it may be a problem when evaluating positions at different
depth parities. But it is not.
The consequence of this bias in UCT, is that when a move has been
searched for only a few simulations, it tends to be over-evaluated. This
is not a big problem, because if it is over-evaluated, then it means
that it will be searched, and the over-evaluation will be corrected
eventually.
Over-estimating a move is not a big problem in UCT, but under-estimating
it would have very bad consequence. So, in practice it is better to have
optimistic move evaluations, especially when they have not been searched
a lot. I had not understood this when I wrote my Turin paper, where I
tried to get unbiased estimates of position values. UCT works better
because it gives optimistic move values.
If you think hard about it, you'll find that the bias in UCT evaluation
helps the search, even though leaves at are different depths. Doing
alpha-beta with Monte-Carlo evaluations at the leaves at different
depths would fail badly (I vaguely remember Bruno Bouzy may have
reported this phenomenon, already). But alpha-beta's back-up is min-max.
With the smooth average backup of UCT, everything works very well.
Rémi
_______________________________________________
computer-go mailing list
[email protected]
http://www.computer-go.org/mailman/listinfo/computer-go/