Remi Munos wrote:

> .. only when the tree is very smooth, ie. when the branches that
> appear good (from obtained rewards so far) are actually good and
> the branches that appear bad are truly bad.

Go is a territorial game. I.e. an accumulative game, therefore
when you loose territory you could have won, all your expectations
are lower, but that has exceptions.

I like your analysis until the Figure 1 in page 6. The figure
makes an existing and very important UCT problem clear. I tried
to make that problem clear in my posts about UCT and the ladder.

After that, you give a solution that may be very good for other
problems, but I think does not apply to go. Let me explain why.

Go is "naturally smooth", but with notable exceptions: The
exceptions are (Those a human can understand because he may foresee
20 moves or more, there may be others much harder to analyze.)
Continuous atari (particularly ladders and ignoring hane on the
first row) and semeai fights. In these situations UCT follows like
in your figure, very badly. The golden rule is: If you won't win it,
don't start playing it. The more you play, if you abandon before you
win, the more you loose. These situations have names and ways to
be detected, already implemented in go software for years.
Attacking them with go-specific enhancements is much more efficient
than with a general purpose algorithm, that has "side effects".

The main "side effect" why we cannot solve the UCT problem making
it "less optimistic" is that _we need that optimism_.
The idea (before it was implemented) sound *too* optimistic, but
it worked (to some extent), just because go is "globally smooth"
due to its accumulative nature, even if "sometime locally sharp".

Without the optimism of UCT, the problem is way too hard to be
solved by simulation and MC returns to its pre-2006 state.


Jacques.


_______________________________________________
computer-go mailing list
[email protected]
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to