Hi, I have been encouraged to post a message to this list because we wrote a paper on the analysis of UCT and other bandit-based methods for tree search, and this might be useful for computer-go.
The paper "Bandit Algorithms for Tree Search" is available as an INRIA technical report at: http://hal.inria.fr/inria-00136198 In short, we start by explaining why UCT may perform very poorly in some cases because of excessive optimism. The regret of UCT is asymptotically O(log n) but with a initial period that may be prohibitively long. In order to avoid this problem, we need to modify the confidence sequence so that to increase exploration when it is necessary. We propose an algorithm (bandit algorithm in smooth trees) that takes into account possible smoothness in the tree, namely that the "bias" of the rewards along an optimal branch is decreasing with the depth of the considered nodes in the branch, where the "bias" of a node is defined as the difference between the value of the node and the expectation of the rewards obtained from that node (ie. the difference between the max, or min-max, value and a Monte-Carlo estimate of its value). Like UCT, the choice of an action depends on upper confidence bounds, but here, the bound of each node is computed in terms of the bounds of its children, as well as a direct Chernoff-Hoeffding bound based on the average rewards and the smoothness of the tree. We show that the algorithm spend a O(log(n)) time on suboptimal branches, thus essentially exploring the optimal branch. The prior smoothness coefficient of the tree is a parameter of the algorithm. If set to 0, this algorithm is nothing else than UCT. If set to infinity, this algo is a flat-UCB performed on the leaves. If a correct parameter is set, it performs better than both extremes. Of course the setting of a correct value of the parameter for the specific game of go is left to the experts, which I am not, unfortunately! Since several go programs are using UCT ideas, we thought this modified version may be of interest. The paper consider the max-max tree case but extension to min-max search is straightforward. Best, Remi Munos _______________________________________________ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/