On Sat, Oct 30, 2010 at 12:26 PM, Willemien <[email protected]> wrote: > On Sat, Oct 30, 2010 at 5:00 PM, Álvaro Begué <[email protected]> wrote: >> On Sat, Oct 30, 2010 at 11:35 AM, Willemien <[email protected]> wrote: >>> On life in 19x 19 >>> http://www.lifein19x19.com/forum/viewtopic.php?p=37588#p37588 >>> >>> i found the statement >>> . >>> " I think there exists a proof that MCTS gives the same results as >>> normal minimax search when the number of playouts goes to infinity." >>> >>> Does this proof really exist and where can i find it? >> >> I haven't seen a formal proof, but it doesn't seem hard. Imagine you >> are in a node such that every move leads to a terminal node. Since >> UCB1 (or whatever other approach to the multi-armed bandit problem you >> are using) will pick the best move a fraction of the time that >> converges to 1, the average result it will return when exploring it >> will be the correct minimax value of the [depth one] tree. Now proceed >> by induction on the depth of the tree. >> > > That is what i don't believe, for having a chanche of 1 for every > move you must be able to go back on a move (the move was not as bad as > tought before) and that mechanism is i missing in MCTS
This seems to indicate that you didn't understand how move selection works in UCB1. If you play infinitely many times, every move will be tried infinitely many times, but the best move will be played almost always. A move that didn't look as good in earlier playouts will have a large variance, which will eventually make it be picked. Think about it this way: At every stage you pick the move that maximizes average+S*standard_deviation, where S stands for "Something" :). If S is zero, you'll simply select that move that has the best average. If S is infinity, you'll pick the move that has been tried the least often (largest standard deviation). If S is somewhere in between, there is a balance. But since S is typically something like sqrt(log(number_of_visits)), it tends to infinity, so every move will be given another chance eventually. Since S grows very slowly, the best move will still be picked most of the time. If you work out the Math, you'll find that the best move is picked with probability 1 asymptotically. _______________________________________________ Computer-go mailing list [email protected] http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
