[computer-go] analysis of UCT and other bandit algorithms for tree search

Remi Munos Tue, 24 Apr 2007 07:01:53 -0700

Hi,

I have been encouraged to post a message to this list because we wrote a paper 
on the analysis of UCT and other bandit-based methods for tree search, and 
this might be useful for computer-go.


The paper "Bandit Algorithms for Tree Search" is available as an INRIA 
technical report at: http://hal.inria.fr/inria-00136198

In short, we start by explaining why UCT may perform very poorly in some cases 
because of excessive optimism. The regret of UCT is asymptotically O(log n) 
but with a initial period that may be prohibitively long.

In order to avoid this problem, we need to modify the confidence sequence so 
that to increase exploration when it is necessary. We propose an algorithm 
(bandit algorithm in smooth trees) that takes into account possible 
smoothness in the tree, namely that the "bias" of the rewards along an 
optimal branch is decreasing with the depth of the considered nodes in the 
branch, where the "bias" of a node is defined as the difference between the 
value of the node and the expectation of the rewards obtained from that node 
(ie. the difference between the max, or min-max, value and a Monte-Carlo 
estimate of its value). Like UCT, the choice of an action depends on upper 
confidence bounds, but here, the bound of each node is computed in terms of 
the bounds of its children, as well as a direct Chernoff-Hoeffding bound 
based on the average rewards and the smoothness of the tree. We show that the 
algorithm spend a O(log(n)) time on suboptimal branches, thus essentially 
exploring the optimal branch. The prior smoothness coefficient of the tree is 
a parameter of the algorithm. If set to 0, this algorithm is nothing else 
than UCT. If set to infinity, this algo is a flat-UCB performed on the 
leaves. If a correct parameter is set, it performs better than both extremes. 
Of course the setting of a correct value of the parameter for the specific 
game of go is left to the experts, which I am not, unfortunately!

Since several go programs are using UCT ideas, we thought this modified 
version may be of interest. The paper consider the max-max tree case but 
extension to min-max search is straightforward.

Best, Remi Munos

_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

[computer-go] analysis of UCT and other bandit algorithms for tree search

Reply via email to