>I'm curious to find out what is meant by "lazy". If, as I am led to >believe by your report, Monte Carlo strategies applied to Double Step >Races are "lazy", yet they converge to perfect play, then I'm not sure >why we are meant to worry.
We worry because the 7 facts below combine to propagate bad decisions by playouts into bad moves. 1), UCT's convergence guarantee is probabilistic. That is, given an epsilon, the probability of having an optimal move will eventually be less than epsilon. 2) the convergence guarantee of UCT is very weak. UCT guarantees that if a node has T trials then every move will receive O(log T) trials. The implication is that the worst case number of trials for achieving a given epsilon is O(exp(-epsilon)). And that is just for a *single* node. If you have a tree of depth N, then the worst case guarantee is O(exp(exp(exp(exp...N times...(exp(epsilon)))). 3) the quality of MC playouts is not guaranteed. It follows that UCT is optimizing the *sum* of the game theoretic score and the *bias* of the playouts. 4) UCT search cannot be deep enough to solve all of the problems on the Go board consecutively. It will make some progress down some of those variations, but then push the rest into the playouts. 5) Forcing moves, such as atari-connect, exist in great numbers in well-played Go positions. "Well-played" here means that the players are not wasting moves. "No wasted moves" means that the balance between life and death is decided by a single turn. "Decided by a single turn" means that many forcing moves exist, and they come to nothing with correct play, but will win if an error is made. Imagine a situation that the playouts do not handle well, such as Ingo's races. The outcome is bad for Black in alternating play. But the playout mishandles it, perhaps because of laziness, and sees that Black wins 40%. What will UCT do? Suppose that UCT requires 1000 trials to solve that problem (e.g., Black realizes that when White plays E5 then Black will lose a string). For Black, UCT will prefer a sequence of forcing moves (elsewhere) that push the loss over the horizon. If White's opportunity to play E5 is delayed long enough, then fewer than 1000 trials will be available to search White's key move, so Black will be happy because the probability of loss is reduced from 100% to 60%. If UCT searches deeply enough to work through all forcing moves whose "temperature" is higher than the difference between winning and losing that local situation, then Black will start to see that there is no reason to delay the inevitable, and begin to think about alternative plays. But then you have to overcome two final fact: 6) When UCT starts thinking about a winning move after trying losing moves for thousands of iterations, then it will require *many* thousands of iterations to change its mind about the eval of the position. E.g., if you have 1000 trials of moves that win 40%, and now you find a move that wins 55%, then you need 2000 more trials for the score of the node to recover to 50%. 7) Solving a problem down one branch is not enough. You must solve it on all sequences. Combine exponential worst-case time with the slowness in recovering from an error, and with the transmission delay in propagating changes through the tree, and with the need to refute forcing moves in all sequences. Do this on a board of size 19x19 that has 5 groups of each color and 10 active local battlefields between them, and you have the makings of a combinatorial explosion of nuclear magnitude. _______________________________________________ computer-go mailing list [email protected] http://www.computer-go.org/mailman/listinfo/computer-go/
