No, because some strong programs use winrate + rave + prior bias. The rave term can provide enough exploration to avoid the need to the UCT term.
David > -----Original Message----- > From: [email protected] [mailto:computer-go- > [email protected]] On Behalf Of ???? ?????? > Sent: Tuesday, October 26, 2010 10:14 PM > To: computer-go > Subject: Re: [Computer-go] Monte Carlo (upper confidence bounds applied to > trees) > > > With uniformly distributed playouts, it would be something around > > c=0.2 in sqrt(c*ln(N)/M), with much more sophisticated heuristics and > > good prior biasing of the node values and then RAVE, c will approach 0 > > as the need for UCB-driven exploration will decrease. > > > Thank you. > > But exploration coefficient C can't be equal to 0 ? Because if it's equal, > then we return to the situation which first post of this thread described > (we use only WinRate). > > > Another question: what to do when the game is over in the Tree Policy, not > in the Default Policy? Do we have to make the program not to select this > node any more (not to call procedure PlaySimulation for this node)? > _______________________________________________ > Computer-go mailing list > [email protected] > http://dvandva.org/cgi-bin/mailman/listinfo/computer-go _______________________________________________ Computer-go mailing list [email protected] http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
