On Jun 10, 2007, at 7:00 AM, Eric Boesch wrote:

The UCT heuristic of trying every child of a node once before trying
any child twice is reasonable when the payoff distribution is unknown.
Why try the lever that paid $5 a second time if there might be another
lever that pays $1,000,000? But when the set of possible payoffs is
known to be {1, -1} -- you either win or you lose -- it makes no sense
to abandon a move with a perfect winning record.

I believe you automatically stick with always-winning moves if you give untried moves a value of 1.0, as Gelly et al did in their 2006 paper (table 7). An always-winning move has a UCT value of 1.0 plus the radius of the confidence interval, so it will be retried before trying a new move.

Before doing that, it might make sense to expend some effort on
reexamining the most promising candidates so far, to sort the really
good candidates from the ones that started out on a lucky streak and
to sort the robust moves from the ones that only work against inferior
replies. As a dead-simple heuristic, one might, when there some

Some degree of this comes for free by not creating a child for a node until it has a certain number of runs (e.g., the board area).

It is also easy to imagine why MC programmers in games with {1, -1}
payoff sets would discover that the choose-best-average-payoff
heuristic is not as good as the choose-the-move-with-the-most-wins
heuristic when selecting the best move to play. The older Mogo paper
mentioned some plausible reasons, and it's easy to imagine others. For
typical distributions, a 10-2 record is better than a 6-1 record,
which is better than a 1-0 record.

If I recall old experimental results properly, this doesn't work as well as one might expect. The problem is that 5-0 looks worse than 51-49.

I (and others) have come up with a lot of the same ideas you have discussed here. It's frustrating that just about every clever idea either (a) has already been explored and incorporated into the strong programs or (b) doesn't help because UCT automatically does it to some degree.

Peter Drake
http://www.lclark.edu/~drake/


_______________________________________________
computer-go mailing list
[email protected]
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to