I reread the original UCT paper, to make sure that I understood why
UCT works. The proof has the following outline:

    1) Show that the UCB formula allocates O(log T) trials to every move,
    where T is the number of trials of this node.

    2) Show that under conditions of "slow drift" that a process having
    the property above converges (recursively) to the the true optimum.

It seemed to me that after proving the O(log T) property, the exact
form of the UCB formula did not enter into the proof.

So I expect that other exploration policies that guarantee O(log T)
exploration would also be asymptotically optimal. An example of such
a policy is the epsilon-greedy policy with exploration rate O(1/T).

One of the problems with UCT is that its convergence rate is very slow.
In the worst case, convergence for a single node can require exponential
time. This is a direct consequence of the O(log T) exploration bound.
Note that for a tree of depth D, the convergence bound is a "tower" of
exponentials.

You can vary the exploration rate by changing to O(log(T) / T), for
example, which would explore more aggressively. And I expect that
exploring at a rate O(1/sqrt(T)) would also asymptotically converge.

The balance that you have to strike is between exploration and
exploitation. The choices that a child makes affects its parent
(games imitating life :-). Children can slow the flow of information to
parents by exploring either too much or too little. There is a domain-
dependent design decision to be made here.

Note that at the root of the tree there is no parent to worry about.

This post has several points:

1) The UCB formula is being used for a side-effect. Alternatives
that achieve that side-effect should have comparable properties.

2) You can make design choices in the exploration/exploitation to
achieve asymptotic convergence with better worst-case behavior.

Best,
Brian

_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to