> It will not continue indefinitely, since as number of trials (in root, > then in b1) goes up, the number of tests (for a1 and a2) stays fixed and > therefore the upper confidence bound of a1 and a2 is increasing - > basically, with ratio M/N going down, we accept that given move might in > the best case have higher and higher winning probability.
Thank you very much for your answer. More questions, if I may. 1. How does UCT work for other games? (chess, for example, where the players can make a draw) Why winrate takes into account only wins? Winrate := Wins/Visits What is better: 3 wins, 0 draws, 7 loses or 2 wins, 6 draws, 2 loses ? 2. After we play all simulations, which move the program should choose? The one with maximim Winrate (Wins/Visits) or the one with maximum Winrate + SQRT(ln(...))? Or (Wins+Draw/2)/Visits ? 3. Why there are so many variations of calculation of UCT value (especially what is under a square root)? Valkyria (22 September 2006) uses this: uct := UCTK*Sqrt(ln(n.Visits)/(5*next.Visits)) Orego uses this: sqrt(2 * logParentRunCount / node.getRuns(move)) and Fuego uses this: m_biasTermConstant * sqrt(logPosCount / (moveCount + 1)); _______________________________________________ Computer-go mailing list [email protected] http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
