On Sat, Oct 30, 2010 at 11:35 AM, Willemien <[email protected]> wrote: > On life in 19x 19 > http://www.lifein19x19.com/forum/viewtopic.php?p=37588#p37588 > > i found the statement > . > " I think there exists a proof that MCTS gives the same results as > normal minimax search when the number of playouts goes to infinity." > > Does this proof really exist and where can i find it?
I haven't seen a formal proof, but it doesn't seem hard. Imagine you are in a node such that every move leads to a terminal node. Since UCB1 (or whatever other approach to the multi-armed bandit problem you are using) will pick the best move a fraction of the time that converges to 1, the average result it will return when exploring it will be the correct minimax value of the [depth one] tree. Now proceed by induction on the depth of the tree. > also because MCTS still needs to learn to work with draws How can it be true? Well, it could be true just for games with no draws, so I don't see a contradiction. However, there are obvious ways to extend MCTS to deal with draws, or with any other reward scheme. Instead of maximizing the probability of winning, you maximize the expected reward, and in the UCB formula you plug in an estimate for the estandard deviation computed from the samples you have so far. The proof I outlined above would still work. Álvaro. _______________________________________________ Computer-go mailing list [email protected] http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
