Hi > I haven't gotten to read through your paper as carefully as I'd like yet, > but I do have a few observations that may be of benefit to other readers on > the list... Mostly observation of assumptions used in the paper. > > 1. A max tree is used instead of minimax
Yes. For min-max trees, several things may change in the proofs but the results are similar and the BAST algorithm is basically the same: for an even node i, the bound Bi is the minimum of: * the max of the children's bound * the average rewards obtained from node i + confidence interval term (similar to UCT) + smoothness constant term (which has to be carefully set, I guess mostly empirically) And for an odd node i, the bound Bi is the maximum of: * the min of the children's bound * the average rewards from node i - confidence interval term - smoothness constant term. > 2. Rewards can be more than just 1 or 0 (win or loss). yes. they are assumed in [0,1]. In the growing tree approach (last section), from each node, a reward is obtained whose expected value is assumed to be close to the true (min-max) value of the node up to the bias term (smoothness constant term). I think this is compatible with the use of monte-carlo simulations, where, from a node, a random (or quasi-random) simulation is run until the end (when the board is full). Thus, even though the final outcome of the full run is 0 or 1 (win or loss), the expected value of the node from which the run is started has a expected value with is a real number between 0 and 1 (because of inherent randomness of the monte-carlo method). Thus, given a simulation device (monte-carlo with or without patterns), the expected value of a node is a real number. Of course the true min-max value of each node is 0 or 1. > 3. The goal is to find the optimal leaf rather than the optimal subtree. I'm not sure to understand this comment. Actually the goal is to find a close-to optimal play at the root level. Thus finding a close-to-optimal min-max leaf achieves that purpose. Cheers, Remi > I'm not trying to say anything about the usefulness of the result, but > rather helping people decide how interested they'd be in the paper. There > are likely counters to the points above. For example, the addresses #1 by > saying "the minimax problem is a direct generalization of the results > presented" > > On 4/24/07, Remi Munos <[EMAIL PROTECTED]> wrote: > > Hi, > > > > I have been encouraged to post a message to this list because we wrote a > > paper > > on the analysis of UCT and other bandit-based methods for tree search, > > and this might be useful for computer-go. > > > > The paper "Bandit Algorithms for Tree Search" is available as an INRIA > > technical report at: http://hal.inria.fr/inria-00136198 > > > > In short, we start by explaining why UCT may perform very poorly in some > > cases > > because of excessive optimism. The regret of UCT is asymptotically O(log > > n) > > but with a initial period that may be prohibitively long. > > > > In order to avoid this problem, we need to modify the confidence sequence > > so > > that to increase exploration when it is necessary. We propose an > > algorithm (bandit algorithm in smooth trees) that takes into account > > possible smoothness in the tree, namely that the "bias" of the rewards > > along an optimal branch is decreasing with the depth of the considered > > nodes in the branch, where the "bias" of a node is defined as the > > difference between the > > value of the node and the expectation of the rewards obtained from that > > node > > (ie. the difference between the max, or min-max, value and a Monte-Carlo > > estimate of its value). Like UCT, the choice of an action depends on > > upper confidence bounds, but here, the bound of each node is computed in > > terms of > > the bounds of its children, as well as a direct Chernoff-Hoeffding bound > > based on the average rewards and the smoothness of the tree. We show that > > the > > algorithm spend a O(log(n)) time on suboptimal branches, thus essentially > > exploring the optimal branch. The prior smoothness coefficient of the > > tree is > > a parameter of the algorithm. If set to 0, this algorithm is nothing else > > than UCT. If set to infinity, this algo is a flat-UCB performed on the > > leaves. If a correct parameter is set, it performs better than both > > extremes. > > Of course the setting of a correct value of the parameter for the > > specific game of go is left to the experts, which I am not, > > unfortunately! > > > > Since several go programs are using UCT ideas, we thought this modified > > version may be of interest. The paper consider the max-max tree case but > > extension to min-max search is straightforward. > > > > Best, Remi Munos > > > > _______________________________________________ > > computer-go mailing list > > [email protected] > > http://www.computer-go.org/mailman/listinfo/computer-go/ _______________________________________________ computer-go mailing list [email protected] http://www.computer-go.org/mailman/listinfo/computer-go/
