On Sat, Oct 30, 2010 at 11:35 AM, Willemien <[email protected]> wrote:
> On life in 19x 19 
> http://www.lifein19x19.com/forum/viewtopic.php?p=37588#p37588
>
> i found the statement
> .
> " I think there exists a proof that MCTS gives the same results as
> normal minimax search when the number of playouts goes to infinity."
>
> Does this proof really exist and where can i find it?

I haven't seen a formal proof, but it doesn't seem hard. Imagine you
are in a node such that every move leads to a terminal node. Since
UCB1 (or whatever other approach to the multi-armed bandit problem you
are using) will pick the best move a fraction of the time that
converges to 1, the average result it will return when exploring it
will be the correct minimax value of the [depth one] tree. Now proceed
by induction on the depth of the tree.

> also because MCTS still needs to learn to work with draws How can it be true?

Well, it could be true just for games with no draws, so I don't see a
contradiction. However, there are obvious ways to extend MCTS to deal
with draws, or with any other reward scheme. Instead of maximizing the
probability of winning, you maximize the expected reward, and in the
UCB formula you plug in an estimate for the estandard deviation
computed from the samples you have so far. The proof I outlined above
would still work.

Álvaro.
_______________________________________________
Computer-go mailing list
[email protected]
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go

Reply via email to