On Sat, Oct 30, 2010 at 5:00 PM, Álvaro Begué <[email protected]> wrote:
> On Sat, Oct 30, 2010 at 11:35 AM, Willemien <[email protected]> wrote:
>> On life in 19x 19 
>> http://www.lifein19x19.com/forum/viewtopic.php?p=37588#p37588
>>
>> i found the statement
>> .
>> " I think there exists a proof that MCTS gives the same results as
>> normal minimax search when the number of playouts goes to infinity."
>>
>> Does this proof really exist and where can i find it?
>
> I haven't seen a formal proof, but it doesn't seem hard. Imagine you
> are in a node such that every move leads to a terminal node. Since
> UCB1 (or whatever other approach to the multi-armed bandit problem you
> are using) will pick the best move a fraction of the time that
> converges to 1, the average result it will return when exploring it
> will be the correct minimax value of the [depth one] tree. Now proceed
> by induction on the depth of the tree.
>

That is what i don't believe, for having a  chanche of 1 for every
move you must be able to go back on a move (the move was not as bad as
tought before) and that mechanism is  i missing in MCTS



>> also because MCTS still needs to learn to work with draws How can it be true?
>
> Well, it could be true just for games with no draws, so I don't see a
> contradiction. However, there are obvious ways to extend MCTS to deal
> with draws, or with any other reward scheme. Instead of maximizing the
> probability of winning, you maximize the expected reward, and in the
> UCB formula you plug in an estimate for the estandard deviation
> computed from the samples you have so far. The proof I outlined above
> would still work.
>
> Álvaro.
> _______________________________________________
> Computer-go mailing list
> [email protected]
> http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
>
_______________________________________________
Computer-go mailing list
[email protected]
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go

Reply via email to