Remi Munos wrote: > .. only when the tree is very smooth, ie. when the branches that > appear good (from obtained rewards so far) are actually good and > the branches that appear bad are truly bad.
Go is a territorial game. I.e. an accumulative game, therefore when you loose territory you could have won, all your expectations are lower, but that has exceptions. I like your analysis until the Figure 1 in page 6. The figure makes an existing and very important UCT problem clear. I tried to make that problem clear in my posts about UCT and the ladder. After that, you give a solution that may be very good for other problems, but I think does not apply to go. Let me explain why. Go is "naturally smooth", but with notable exceptions: The exceptions are (Those a human can understand because he may foresee 20 moves or more, there may be others much harder to analyze.) Continuous atari (particularly ladders and ignoring hane on the first row) and semeai fights. In these situations UCT follows like in your figure, very badly. The golden rule is: If you won't win it, don't start playing it. The more you play, if you abandon before you win, the more you loose. These situations have names and ways to be detected, already implemented in go software for years. Attacking them with go-specific enhancements is much more efficient than with a general purpose algorithm, that has "side effects". The main "side effect" why we cannot solve the UCT problem making it "less optimistic" is that _we need that optimism_. The idea (before it was implemented) sound *too* optimistic, but it worked (to some extent), just because go is "globally smooth" due to its accumulative nature, even if "sometime locally sharp". Without the optimism of UCT, the problem is way too hard to be solved by simulation and MC returns to its pre-2006 state. Jacques. _______________________________________________ computer-go mailing list [email protected] http://www.computer-go.org/mailman/listinfo/computer-go/
