On 4/25/07, Remi Munos <[EMAIL PROTECTED]> wrote:

> When I did sit down and read the paper for real, I saw that both of
those
> things were just building up the support for the BAST algorithm.  (a) is
to
> justify accepting a higher regret (in exchange for better worst case
> performance).

well, this is not really accepting higher regret. I would say that this is
to
guarantee lower regret in all cases. UCT has (non-asymptotic, ie. for a
fixed
stage n) low regret (in the worst case) only when the tree is very smooth,
ie. when the branches that appear good (from obtained rewards so far) are
actually good and the branches that appear bad are truly bad. Of course, I
don't know how much true this is for go. But in bad cases, UCT is terribly
bad! Basically, one cannot expect UCT to have (for a fixed n) a regret
independent of D (depth of the tree) unless the tree is smooth with a
smoothness constant = 0 (ie. all values are the same!). I see BAST as a
natural extension of UCT when the tree is not that smooth. If no
smoothness
exists at all, I think flat-UCB is the best (for a given tree of depth D).



I saw the analysis of regret = log(n) for UCT and regret = sqrt(n) for the
modified UCB.  I then assumed that sqrt(n) >> log(n) for most n.  The
graphical example would find (D-1)/D almost immediately.  I then assumed the
regret for normal UCT was better up until the modified UCT found the 1.

I suspect that this discussion hits on the main point of the paper.  When I
read the paper, the focus on the worst case performance overshadowed general
applicability to a larger variety of trees.  The additional description of
the variance in the L=0 results (for normal UCT and the BAST variation on
it) would do a lot to help me appreciate the results in the paper.
_______________________________________________
computer-go mailing list
[email protected]
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to