On Sat, Oct 30, 2010 at 5:00 PM, Álvaro Begué <[email protected]> wrote: > On Sat, Oct 30, 2010 at 11:35 AM, Willemien <[email protected]> wrote: >> On life in 19x 19 >> http://www.lifein19x19.com/forum/viewtopic.php?p=37588#p37588 >> >> i found the statement >> . >> " I think there exists a proof that MCTS gives the same results as >> normal minimax search when the number of playouts goes to infinity." >> >> Does this proof really exist and where can i find it? > > I haven't seen a formal proof, but it doesn't seem hard. Imagine you > are in a node such that every move leads to a terminal node. Since > UCB1 (or whatever other approach to the multi-armed bandit problem you > are using) will pick the best move a fraction of the time that > converges to 1, the average result it will return when exploring it > will be the correct minimax value of the [depth one] tree. Now proceed > by induction on the depth of the tree. >
That is what i don't believe, for having a chanche of 1 for every move you must be able to go back on a move (the move was not as bad as tought before) and that mechanism is i missing in MCTS >> also because MCTS still needs to learn to work with draws How can it be true? > > Well, it could be true just for games with no draws, so I don't see a > contradiction. However, there are obvious ways to extend MCTS to deal > with draws, or with any other reward scheme. Instead of maximizing the > probability of winning, you maximize the expected reward, and in the > UCB formula you plug in an estimate for the estandard deviation > computed from the samples you have so far. The proof I outlined above > would still work. > > Álvaro. > _______________________________________________ > Computer-go mailing list > [email protected] > http://dvandva.org/cgi-bin/mailman/listinfo/computer-go > _______________________________________________ Computer-go mailing list [email protected] http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
