[computer-go] Reflections on a disaster

Brian Sheppard Wed, 20 May 2009 17:39:52 -0700

The simplest problems give me new appreciation for the difficulties we face
in programming this


maddening game. Here is an example, with X to play:

 

  1 2 3 4 5 6 7 8 9

A - X - - - - X - - 

B - - - - X X - X X 

C X - - - X - X O O 

D X X X O X X O O O 

E O O O X X X X O O 

F - X X O O O O - O 

G - X O O - - - O O 

H O O X X X X O - - 

J - O O X - O O - - 

 

This position is not that complicated, and many players would make the
winning move (F1) without

thinking. After all, F1 captures the three-stone O string on the left, and
saves the three-stone X string below.

 

But a little more thought reveals that F1 is forced. The real problem in
this position is the five-stone X

string at bottom, which is locked in a semeai with the four-stone O string
at bottom left. X is winning

that semeai by 3 liberties to 2, but X needs to fill G1 and then H1 to
capture. Unfortunately, if X plays

G1 without playing F1 first, then G1 is self-atari and loses.

 

The bottom line is that the only winning sequence starts with F1. Otherwise,
O fills in G6, G5, and J5

before X can fill in F1, G1, and J1.

 

Such a simple situation. Would you figure that a program rated 1995 on CGOS
would have any

trouble with it? Well,.

 

What happens here is that Pebbles (as X) initially sees F1 as probably
*losing*. Here are the dynamics:

 

1)       I have measured that encouraging Atari moves in the trials is
self-defeating, so I don't do it.

2)       Pebbles generates ladder plays in the trials, but only adjacent to
the opponent's last play. (This won't help here.)

3)       There is an Atari bonus in the tree search, but the weight is
small.

4)       A larger weight is placed on proximity to the opponent's last move.

 

So here is the dynamic in the first 40 or so trials of F1: O will respond by
running out of Atari at C4.

X will play adjacent to that play, because even though G1 gets the Atari
bonus, playing C3 gets the

larger proximity bonus. O will rarely play J1 or G1, because these moves are
not bonused. Eventually,

O will play G6 or G5 or J5. And then X goes truly wrong because of the
proximity heuristic: X will make

a play "near" O's last play, and this is disastrous because it often fills
in X's own liberty. O then responds

near X's last play, which wins the semeai. So X loses the trial!

 

Of the first 40 trials, X is winning about 35%. Now, the problem is that in
the rest of the variations, X does

great in the early going. This is because O tries to run out of Atari by
playing F1, and then X captures with G1!

 

It takes many thousands of trials to prove that all of X's possible plays
have less than a 50% chance of

winning before attention returns to F1. Then F1 isn't preferred until over
60,000 trials have elapsed.

 

Here are a few reflections on this disaster:

 

1) Start on a positive note: this situation is very bad for the heuristics
encoded in Pebbles, yet UCT

solves the problem anyway. Indeed, UCT provides us with a scalable strategy
for *safely* encoding Go

knowledge into a search engine. UCT will solve the problem even if our
initial impression is wrong.

 

2) It is possible (and tempting) to write code that sees through this sort
of thing. But I have to wonder about

the scalability of that strategy. It takes a lot of time to create the code.
And testing is an issue. Can we apply

machine learning to discover move ordering knowledge? There are methods in
the literature already, but they

don't *scale*. Usually a finite pattern base is involved, or the cost of
pattern matching rises with the size of the

pattern set, or the knowledge gained cannot be proven to rise indefinitely.

 

3) Even if we do discover move ordering knowledge, is that sufficient? I
have doubts. It seems to me that

improving move ordering is a constant speed-up. That is, it doesn't provide
efficiency gains that increase

with increases in computer power. Specifically, the gain is bounded by the
number of trials required for

UCT-RAVE to discover the recommended moves. Granted, this can be a *lot* of
trials. But keep in mind that

heuristics often produce the *wrong* move ordering, too. In that case there
is a loss of efficiency.

 

4) This is just a puny 9x9 board with just two semeais, each of which is
between 2 and 6 moves long.

Things can get a lot more complicated than that, even on the small board. On
a 19x19 board, there are

a lot of battles, and the complexities rise combinatorially.

 

5) A "no free lunch" theorem is likely to apply; to determine whether a
single stone is alive on a Go board is

an NP-complete problem. So in theory all of our heuristics are subject to
bad cases. This case happens to

trip Pebbles. If Pebbles had more complicated heuristics then there would be
other cases.

 

Something to think about.

 

Best,

Brian

_______________________________________________
computer-go mailing list
[email protected]
http://www.computer-go.org/mailman/listinfo/computer-go/

[computer-go] Reflections on a disaster

Reply via email to