Re: [computer-go] Heuristics for MC/UCT with all-or-nothing payouts

Peter Drake Sun, 10 Jun 2007 09:57:57 -0700

On Jun 10, 2007, at 7:00 AM, Eric Boesch wrote:

The UCT heuristic of trying every child of a node once before trying
any child twice is reasonable when the payoff distribution is unknown.
Why try the lever that paid $5 a second time if there might be another
lever that pays $1,000,000? But when the set of possible payoffs is
known to be {1, -1} -- you either win or you lose -- it makes no sense
to abandon a move with a perfect winning record.

I believe you automatically stick with always-winning moves if yougive untried moves a value of 1.0, as Gelly et al did in their 2006paper (table 7). An always-winning move has a UCT value of 1.0 plusthe radius of the confidence interval, so it will be retried beforetrying a new move.

Before doing that, it might make sense to expend some effort on
reexamining the most promising candidates so far, to sort the really
good candidates from the ones that started out on a lucky streak and
to sort the robust moves from the ones that only work against inferior
replies. As a dead-simple heuristic, one might, when there some

Some degree of this comes for free by not creating a child for a nodeuntil it has a certain number of runs (e.g., the board area).

It is also easy to imagine why MC programmers in games with {1, -1}
payoff sets would discover that the choose-best-average-payoff
heuristic is not as good as the choose-the-move-with-the-most-wins
heuristic when selecting the best move to play. The older Mogo paper
mentioned some plausible reasons, and it's easy to imagine others. For
typical distributions, a 10-2 record is better than a 6-1 record,
which is better than a 1-0 record.

If I recall old experimental results properly, this doesn't work aswell as one might expect. The problem is that 5-0 looks worse than51-49.

I (and others) have come up with a lot of the same ideas you havediscussed here. It's frustrating that just about every clever ideaeither (a) has already been explored and incorporated into the strongprograms or (b) doesn't help because UCT automatically does it tosome degree.


Peter Drake
http://www.lclark.edu/~drake/


_______________________________________________
computer-go mailing list
[email protected]
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Heuristics for MC/UCT with all-or-nothing payouts

Reply via email to