I think many of you miss the point completely about the scalability issue.

You can ALWAYS construct a situations where leaf node evaluation is
wrong or where it will take an arbitrary number of play to solve a
particular problem.      In chess it is possible to construct problems
that cannot be solved even in 100 ply.     That in no way means it is
not scalable.     It just means you might need 100 ply to play that one
position correct.    In mini-max search every time you add a ply you
make it play a FEW moves better.    Not EVERY move.   That's what
scalability is.

But what keeps getting implied is that uniform random is more scalable
even though it's well known that this is an extremely biased evaluation
function.   It is certainly not immune to misconceptions and it does not
imply an "unbiased" way to search.    Every play-out policy (except
perfect play) has bias in it,  but does not put a limit on the
strength.    

And YES,  you can prove that one play-out policy will be better in SOME
positions than others.  Uniformly random will play  better than Mogo's
intelligent play-out in some position you can probably construct if you
try hard enough.     But just try playing a match and see where it gets you.

- Don



Harald Korneliussen wrote:
> Ivan Dubois mentioned  the bent four in the corner shape as a
> scalability killer, a situation where more playouts doesn't help
> (much), because playouts systematically misevaluate it. As I
> understand it, it could be corrected in the tree, but this is very
> unlikely to happen until the very end of a game, by which time it may
> be too late (Mogo having worked the entire game for that solid 0.5
> win, which turns out to be a solid loss instead because of the
> life-and-death misevaluation)
>
> I recalled a KGS game of Mogo I'd looked at, where something very
> similar happened, and with a little digging I found it again:
>
> http://files.gokgs.com/games/2007/12/1/Ardalan-MoGoBot3.sgf
>
> It turns out it's not the "bent four" shape, but I suspect it's
> another such shape, where more playouts only confirm that "these moves
> aren't worth including into the tree", so that UCT catches them very
> late, if ever.
>
> If these situations can be reliably created by a human, then indeed
> they put an upper limit on the "real world" scalability of a program.
>
> If I should propose a hackish heuristic to deal with such situations,
> this is it: At one point, when the problematic shape appeared, the
> human must have done a move that to the computer seemed horribly bad.
> "Why did he do that? Doesn't he see that my shape is alive?". When
> such situations occur, there are two possibilities:
>
> 1. The bot is playing a weaker human player, and the move is indeed bad.
> 2. The bot is playing a stronger human, and the move is actually good.
>
> I think it may be a good idea to do something with the weighting in
> these situations, so that the relevant moves are added to the tree. In
> worst case, a lot of effort is wasted in proving a bad move bad - but
> this should not be so serious, as the bad move will likely mean the
> opponent has poor chances of winning anyway. In the best case, the
> program's blunder is revealed after the fact. This may still leave
> little chance of winning, (if the l&d error was severe) but at least
> the program's counting won't be off for the rest of the game. Since
> today's programs don't care for winning margins, counting errors by
> even a single point will spell disaster.
>
> I believe this heurtistic would be cheap in terms of computational
> cost, but hard to evaluate/tune. Self-play would not be very
> effective...
> _______________________________________________
> computer-go mailing list
> [email protected]
> http://www.computer-go.org/mailman/listinfo/computer-go/
>
>   
_______________________________________________
computer-go mailing list
[email protected]
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to