> Darren Cook wrote:
> > What do your program's playouts think when presented with the board
> > position in the article? This is a terminal position, both players have
> > passed, a comfortable white win, yet pure random playouts think black
> > will win more often.
> >
> >>> http://dcook.org/compgo/article_the_problem_with_random_playouts.html
Looking at this position, I asked "how do humans look at this situation?"
As Darren said, we don't extrapolate all possible lines of play to the end.
Instead, humans create a sort of proof which condenses vast numbers of playouts
into a few simple thoughts.
For instance, at the top of the board, there are two black stones in atari. One
of the neighboring white groups has two liberties; the other has three.
Therefore, black cannot save his stones by a direct capture. Suppose black
makes a threat - such as putting the white stones in the upper left into atari.
Such a threat can be answered by capturing the two black stones, gaining two
extra liberties for the white group. A short tree search confirms that black
cannot pursue that line. The other alternative is for the two black stones to
extend to the edge of the board. A short tree analysis shows the futility of
that line of play. This sort of analysis is excruciatingly obvious to a human
player. Similar analysis determines the status of the other groups, and the
four singlet stones - there is no prayer of changing the outcome, assuming
proper play by the opponent.
This is an end-game situation; the outcome already blindingly obvious. Back up
a few moves, the position is a bit more open, the min-max trees become a bit
more bushy, but it's still tractable for ordinary mid-kyu human players.
Is it possible to incorporate information gleaned from such directed proofs of
life-death status into playouts, so that the best line of play is automatically
followed? Supposing that min-max search reveals that B is the best response to
A, while C or D would "snatch defeat from the jaws of victory", can the
playouts be discouraged from following A with C or D -- until such time as the
situation has changed in a provable sense?
When I spoke of "a short tree analysis", in human terms, that tree is short
because we exclude dumb moves. When a group is in atari at the top left, we
don't try to save it by playing at random locations in the middle or bottom
right; we know from the shape that some moves have a chance of altering the
outcome, and others could not concievably make a difference. We give up some
stones and save others; sometimes we deliberately sacrifice a few to gain
advantages. Our analysis, at a fundamental level, depends on the basic rule:
players take turns. You don't get to make two moves in a row; instead, you make
a threat which forces the move you want, causing the weakness which you exploit.
Much of what we know about life and death can easily be expressed in the form
of conditional move trees or state machines -- if Black plays A, respond with
B; switch states; follow C with D; etc.
Would such a state machine integrate well with random playouts? It would help
if the state machine can be probabilistic; sometimes you definitely want to
save that 25-point group, other times you are willing to think about giving up
10 points here to gain 12 there - perhaps there is a ko fight in progress. What
is at stake? What is being risked?
_______________________________________________
computer-go mailing list
[email protected]
http://www.computer-go.org/mailman/listinfo/computer-go/