Re: [computer-go] Depth dependent evaluation effects on monte carlo searches

Rémi Coulom Sat, 09 Jun 2007 04:29:46 -0700

Jason House wrote:

Darren Cook wrote:
Hi Jason,
In UCT the monte carlo searches (I find it clearer to call them "the
playouts") are always run to the end of the game. So they always
accurately (well, as accurate as a random playout can be!) take sente in
account. Therefore my understanding is that it does not matter whether
they start at an even or odd ply in the UCT part of the tree.
Darren
Let's take a basic example of a leaf node in an MC search tree thathasn't been expanded, but has 4 children. Let's say that randomsimulation through the children have winning percentages of {46%, 51%,47%, 48%}. Assuming a uniform simulation policy, the winning percentagewould be the average of the four, or 48%... but when that node getsexpanded, it'll start discovering the winning percentages of thechildren nodes. Now when we look at the node that used to be our leafnode and ask what it's winning percentage would be, we come to adifferent conclusion... The winning percentage is either 46% or 51%depending on if the color to move.
In this example, the difference between the 0-ply and the 1-play is 3%(51%-48%) or 2% (48%-46%) depending on what the color to move was. Doesthat help?

If I understand what you mean, backing-up the average of playouts from aposition has a tendency to under-estimate that position from the pointof view of the player to move, because if the playout would start by thebest move instead of a random move, they would produce a better value.You fear that it may be a problem when evaluating positions at differentdepth parities. But it is not.

The consequence of this bias in UCT, is that when a move has beensearched for only a few simulations, it tends to be over-evaluated. Thisis not a big problem, because if it is over-evaluated, then it meansthat it will be searched, and the over-evaluation will be correctedeventually.

Over-estimating a move is not a big problem in UCT, but under-estimatingit would have very bad consequence. So, in practice it is better to haveoptimistic move evaluations, especially when they have not been searcheda lot. I had not understood this when I wrote my Turin paper, where Itried to get unbiased estimates of position values. UCT works betterbecause it gives optimistic move values.

If you think hard about it, you'll find that the bias in UCT evaluationhelps the search, even though leaves at are different depths. Doingalpha-beta with Monte-Carlo evaluations at the leaves at differentdepths would fail badly (I vaguely remember Bruno Bouzy may havereported this phenomenon, already). But alpha-beta's back-up is min-max.With the smooth average backup of UCT, everything works very well.


Rémi
_______________________________________________
computer-go mailing list
[email protected]
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Depth dependent evaluation effects on monte carlo searches

Reply via email to