On 4/25/07, Remi Munos <[EMAIL PROTECTED]> wrote:
> When I did sit down and read the paper for real, I saw that both of those > things were just building up the support for the BAST algorithm. (a) is to > justify accepting a higher regret (in exchange for better worst case > performance). well, this is not really accepting higher regret. I would say that this is to guarantee lower regret in all cases. UCT has (non-asymptotic, ie. for a fixed stage n) low regret (in the worst case) only when the tree is very smooth, ie. when the branches that appear good (from obtained rewards so far) are actually good and the branches that appear bad are truly bad. Of course, I don't know how much true this is for go. But in bad cases, UCT is terribly bad! Basically, one cannot expect UCT to have (for a fixed n) a regret independent of D (depth of the tree) unless the tree is smooth with a smoothness constant = 0 (ie. all values are the same!). I see BAST as a natural extension of UCT when the tree is not that smooth. If no smoothness exists at all, I think flat-UCB is the best (for a given tree of depth D).
I saw the analysis of regret = log(n) for UCT and regret = sqrt(n) for the modified UCB. I then assumed that sqrt(n) >> log(n) for most n. The graphical example would find (D-1)/D almost immediately. I then assumed the regret for normal UCT was better up until the modified UCT found the 1. I suspect that this discussion hits on the main point of the paper. When I read the paper, the focus on the worst case performance overshadowed general applicability to a larger variety of trees. The additional description of the variance in the L=0 results (for normal UCT and the BAST variation on it) would do a lot to help me appreciate the results in the paper.
_______________________________________________ computer-go mailing list [email protected] http://www.computer-go.org/mailman/listinfo/computer-go/
