[computer-go] Bent four in the corner was:Scalability problem of play-out policies

Harald Korneliussen Wed, 23 Jan 2008 06:31:45 -0800

Ivan Dubois mentioned  the bent four in the corner shape as a
scalability killer, a situation where more playouts doesn't help
(much), because playouts systematically misevaluate it. As I
understand it, it could be corrected in the tree, but this is very
unlikely to happen until the very end of a game, by which time it may
be too late (Mogo having worked the entire game for that solid 0.5
win, which turns out to be a solid loss instead because of the
life-and-death misevaluation)


I recalled a KGS game of Mogo I'd looked at, where something very
similar happened, and with a little digging I found it again:

http://files.gokgs.com/games/2007/12/1/Ardalan-MoGoBot3.sgf

It turns out it's not the "bent four" shape, but I suspect it's
another such shape, where more playouts only confirm that "these moves
aren't worth including into the tree", so that UCT catches them very
late, if ever.

If these situations can be reliably created by a human, then indeed
they put an upper limit on the "real world" scalability of a program.

If I should propose a hackish heuristic to deal with such situations,
this is it: At one point, when the problematic shape appeared, the
human must have done a move that to the computer seemed horribly bad.
"Why did he do that? Doesn't he see that my shape is alive?". When
such situations occur, there are two possibilities:

1. The bot is playing a weaker human player, and the move is indeed bad.
2. The bot is playing a stronger human, and the move is actually good.

I think it may be a good idea to do something with the weighting in
these situations, so that the relevant moves are added to the tree. In
worst case, a lot of effort is wasted in proving a bad move bad - but
this should not be so serious, as the bad move will likely mean the
opponent has poor chances of winning anyway. In the best case, the
program's blunder is revealed after the fact. This may still leave
little chance of winning, (if the l&d error was severe) but at least
the program's counting won't be off for the rest of the game. Since
today's programs don't care for winning margins, counting errors by
even a single point will spell disaster.

I believe this heurtistic would be cheap in terms of computational
cost, but hard to evaluate/tune. Self-play would not be very
effective...
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

[computer-go] Bent four in the corner was:Scalability problem of play-out policies

Reply via email to