Re: [Computer-go] MCTS vs Minimax

Álvaro Begué Sat, 30 Oct 2010 15:24:46 -0700

On Sat, Oct 30, 2010 at 12:26 PM, Willemien <[email protected]> wrote:
> On Sat, Oct 30, 2010 at 5:00 PM, Álvaro Begué <[email protected]> wrote:
>> On Sat, Oct 30, 2010 at 11:35 AM, Willemien <[email protected]> wrote:
>>> On life in 19x 19 
>>> http://www.lifein19x19.com/forum/viewtopic.php?p=37588#p37588
>>>
>>> i found the statement
>>> .
>>> " I think there exists a proof that MCTS gives the same results as
>>> normal minimax search when the number of playouts goes to infinity."
>>>
>>> Does this proof really exist and where can i find it?
>>
>> I haven't seen a formal proof, but it doesn't seem hard. Imagine you
>> are in a node such that every move leads to a terminal node. Since
>> UCB1 (or whatever other approach to the multi-armed bandit problem you
>> are using) will pick the best move a fraction of the time that
>> converges to 1, the average result it will return when exploring it
>> will be the correct minimax value of the [depth one] tree. Now proceed
>> by induction on the depth of the tree.
>>
>
> That is what i don't believe, for having a  chanche of 1 for every
> move you must be able to go back on a move (the move was not as bad as
> tought before) and that mechanism is  i missing in MCTS


This seems to indicate that you didn't understand how move selection
works in UCB1. If you play infinitely many times, every move will be
tried infinitely many times, but the best move will be played almost
always. A move that didn't look as good in earlier playouts will have
a large variance, which will eventually make it be picked.

Think about it this way: At every stage you pick the move that
maximizes average+S*standard_deviation, where S stands for "Something"
:). If S is zero, you'll simply select that move that has the best
average. If S is infinity, you'll pick the move that has been tried
the least often (largest standard deviation). If S is somewhere in
between, there is a balance. But since S is typically something like
sqrt(log(number_of_visits)), it tends to infinity, so every move will
be given another chance eventually. Since S grows very slowly, the
best move will still be picked most of the time. If you work out the
Math, you'll find that the best move is picked with probability 1
asymptotically.
_______________________________________________
Computer-go mailing list
[email protected]
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go

Re: [Computer-go] MCTS vs Minimax

Reply via email to