Do you have any indication, which can be derived from the playouts, that a 
position might deserve an extra allotment of thinking time?

 Terry McIntyre <[email protected]>


Any system of entrusting the government to judge and correct its own abuses is 
the same as appointing the accused criminal as his own judge and jury: don't 
expect many convictions.


-- Allen Thornton, Laws of the Jungle




________________________________
From: David Fotland <[email protected]>
To: computer-go <[email protected]>
Sent: Wednesday, May 20, 2009 10:46:45 PM
Subject: RE: [computer-go] Reflections on a disaster


Many Faces’ static move generator suggests F1 as the first
move to try.  Still it needs about 35K playouts before F1 is preferred.  For
some unknown reason it likes H1 before that.  F1  at 35K playouts has
a pretty low win rate, about 35%, because the playouts can’t figure out
the semeai.  It needs a million playouts before it gets to 90% confident
on F1 (about 25 seconds).
 
David
 
From:[email protected]
[mailto:[email protected]] On Behalf Of Brian Sheppard
Sent: Wednesday, May 20, 2009 5:39 PM
To: [email protected]
Subject: [computer-go] Reflections on a disaster
 
The
simplest problems give me new appreciation for the difficulties we face in
programming this
maddening
game. Here is an example, with X to play:
 
  1 2 3 4 5 6 7 8 9
A - X - - - - X - - 
B - - - - X X - X X 
C X - - - X - X O O 
D X X X O X X O O O 
E O O O X X X X O O 
F - X X O O O O - O 
G - X O O - - - O O 
H O O X X X X O - - 
J - O O X - O O - - 
 
This
position is not that complicated, and many players would make the winning move
(F1) without
thinking.
After all, F1 captures the three-stone O string on the left, and saves the
three-stone X string below.
 
But
a little more thought reveals that F1 is forced. The real problem in this
position is the five-stone X
string
at bottom, which is locked in a semeai with the four-stone O string at bottom
left. X is winning
that
semeai by 3 liberties to 2, but X needs to fill G1 and then H1 to capture.
Unfortunately, if X plays
G1
without playing F1 first, then G1 is self-atari and loses.
 
The
bottom line is that the only winning sequence starts with F1. Otherwise, O
fills in G6, G5, and J5
before
X can fill in F1, G1, and J1.
 
Such
a simple situation. Would you figure that a program rated 1995 on CGOS would
have any
trouble
with it? Well,…
 
What
happens here is that Pebbles (as X) initially sees F1 as probably *losing*.
Here are the dynamics:
 
1)  I have measured that
encouraging Atari moves in the trials is self-defeating, so I don’t do it.
2)  Pebbles generates
ladder plays in the trials, but only adjacent to the opponent’s last
play. (This won’t help here.)
3)  There is an Atari
bonus in the tree search, but the weight is small.
4)  A larger weight is
placed on proximity to the opponent’s last move.
 
So
here is the dynamic in the first 40 or so trials of F1: O will respond by
running out of Atari at C4.
X
will play adjacent to that play, because even though G1 gets the Atari bonus,
playing C3 gets the
larger
proximity bonus. O will rarely play J1 or G1, because these moves are not
bonused. Eventually,
O
will play G6 or G5 or J5. And then X goes truly wrong because of the proximity
heuristic: X will make
a
play “near” O’s last play, and this is disastrous because it
often fills in X’s own liberty. O then responds
near
X’s last play, which wins the semeai. So X loses the trial!
 
Of
the first 40 trials, X is winning about 35%. Now, the problem is that in the
rest of the variations, X does
great
in the early going. This is because O tries to run out of Atari by playing F1,
and then X captures with G1!
 
It
takes many thousands of trials to prove that all of X’s possible plays
have less than a 50% chance of
winning
before attention returns to F1. Then F1 isn’t preferred until over 60,000
trials have elapsed.
 
Here
are a few reflections on this disaster:
 
1)
Start on a positive note: this situation is very bad for the heuristics encoded
in Pebbles, yet UCT
solves
the problem anyway. Indeed, UCT provides us with a scalable strategy for
*safely* encoding Go
knowledge
into a search engine. UCT will solve the problem even if our initial impression
is wrong.
 
2)
It is possible (and tempting) to write code that sees through this sort of
thing. But I have to wonder about
the
scalability of that strategy. It takes a lot of time to create the code. And
testing is an issue. Can we apply
machine
learning to discover move ordering knowledge? There are methods in the
literature already, but they
don’t
*scale*. Usually a finite pattern base is involved, or the cost of pattern
matching rises with the size of the
pattern
set, or the knowledge gained cannot be proven to rise indefinitely.
 
3)
Even if we do discover move ordering knowledge, is that sufficient? I have
doubts. It seems to me that
improving
move ordering is a constant speed-up. That is, it doesn’t provide
efficiency gains that increase
with
increases in computer power. Specifically, the gain is bounded by the number of
trials required for
UCT-RAVE
to discover the recommended moves. Granted, this can be a *lot* of trials. But
keep in mind that
heuristics
often produce the *wrong* move ordering, too. In that case there is a loss of
efficiency.
 
4)
This is just a puny 9x9 board with just two semeais, each of which is between 2
and 6 moves long.
Things
can get a lot more complicated than that, even on the small board. On a 19x19
board, there are
a
lot of battles, and the complexities rise combinatorially.
 
5)
A “no free lunch” theorem is likely to apply; to determine whether
a single stone is alive on a Go board is
an
NP-complete problem. So in theory all of our heuristics are subject to bad
cases. This case happens to
trip
Pebbles. If Pebbles had more complicated heuristics then there would be other
cases.
 
Something
to think about…
 
Best,
Brian


      
_______________________________________________
computer-go mailing list
[email protected]
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to