On 6/10/07, Peter Drake <[EMAIL PROTECTED]> wrote:
On Jun 10, 2007, at 7:00 AM, Eric Boesch wrote:
[..snipped...]
> It is also easy to imagine why MC programmers in games with {1, -1} > payoff sets would discover that the choose-best-average-payoff > heuristic is not as good as the choose-the-move-with-the-most-wins > heuristic when selecting the best move to play. The older Mogo paper > mentioned some plausible reasons, and it's easy to imagine others. For > typical distributions, a 10-2 record is better than a 6-1 record, > which is better than a 1-0 record.If I recall old experimental results properly, this doesn't work as well as one might expect. The problem is that 5-0 looks worse than 51-49.
If you ever get 5-0 and 51-49 as siblings, that's a potential problem. With vanilla MC/UCT, that will never happen (after the first of the 49 losses, that child would be ignored until every other child had lost once), which is why the most-visited was good enough for Mogo and why I settled for it. But now that you mention it, there's no reason to be lazy (the heuristic only gets applied once per possible move per turn, so you don't need simplicity for speed's sake), and with variants, you really could get 5-0 and 51-49 as siblings. Maybe a better compromise between most-wins and highest-winning-percentage would be to create lower confidence bounds by negating the confidence interval term of the UCB1 formula. Hopefully that would lead to comparisons that are at least defensible in just about every case -- 10-1 > 2-0, 25-0 > 70-30, 20-1 > 22-2, etc., some of which are comparisons that might arise even in vanilla MC/UCT.
I (and others) have come up with a lot of the same ideas you have discussed here. It's frustrating that just about every clever idea either (a) has already been explored and incorporated into the strong programs or (b) doesn't help because UCT automatically does it to some degree.
As I said, I'm not claiming to have invented anything here; it's just hard for me to keep track of where these things originated and what the usual terms for these things are. If the repeat-winners approach was first used by the Mogo team, then thanks for clarifying that. If you want to claim any of the other notions I mentioned, go ahead. As for experiencing limited improvements by tweaking the formulas, it would hardly be surprising if there are limits to how much an MC/UCT (or BAST or whatever) type program can be improved by just swapping out one formula and replacing it with another one using the same variables, since the formulas are pretty good to begin with, and the more glaring weaknesses of vanilla MC/UCT, which the strongest MC programs already address in other ways (though I think there is still room for improvement in their search localization in particular, not that I have any clear notion of how the improvement could be achieved without jeopardizing the programs' existing strengths), can't be fixed by just tweaking formulas. _______________________________________________ computer-go mailing list [email protected] http://www.computer-go.org/mailman/listinfo/computer-go/
