In fact, MCTS (UCT without the +sqrt(log(....)/.....) converges also to optimal behavior, at least if winning rates are computed e.g. (nbOfWins+1)/(nbOfSimulations+2) for 2-player deterministic win/loss games. We published this in http://hal.inria.fr/inria-00437146_v1/
(finite horizon) Best regards, Olivier 2011/11/15 Brian Sheppard <[email protected]> > The assumptions under which UCT converges are the same as alpha-beta, > minimax, state-space search, A*, etc. > > "Converges" has a definition. You cannot redefine "converges" to preclude > large (but finite) amounts of resources. > > > -----Original Message----- > From: [email protected] > [mailto:[email protected]] On Behalf Of Dave Dyer > Sent: Monday, November 14, 2011 5:13 PM > To: [email protected]; [email protected] > Subject: Re: [Computer-go] convergence in UCT search > > At 12:40 PM 11/14/2011, Brian Sheppard wrote: > >My understanding is the opposite: UCT search with random playouts > converges > >to best play in every two-player game. > > That is true with infinite time and memory. In practice, > even very obviously non-optimum moves persist with any reasonable > amount of search resources. > > > _______________________________________________ > Computer-go mailing list > [email protected] > http://dvandva.org/cgi-bin/mailman/listinfo/computer-go > > _______________________________________________ > Computer-go mailing list > [email protected] > http://dvandva.org/cgi-bin/mailman/listinfo/computer-go > -- ========================================================= Olivier Teytaud -- [email protected] TAO, LRI, UMR 8623(CNRS - Universite Paris-Sud), bat 490 Universite Paris-Sud F-91405 Orsay Cedex France http://0z.fr/EJm0g (one of the 56.5 % of french who did not vote for Sarkozy in 2007)
_______________________________________________ Computer-go mailing list [email protected] http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
