I think there was some confusion in Don's post on ``out of atari'' in play-outs. For one thing, I do not agree with the maximal information argument. Testing ``out of atari'' moves is not good because they might be good, or might be bad, but merely because they might be good. By contrast, you should test (in the tree) a kind of move that is either good or average, but not either average or bad, even if it's the same amount of information. In the tree, you look for the best move. Near the root at least; when going deeper and the evaluation being less precise, you merely look for good moves, that keep a trustworthy evaluation of the position, and try to avoid brittle ones.
In the playouts, that's another matter. I would say that (almost) always playing 'out of atari' would add stability, much in the way Magnus Persson very well explained. What do we want of playout policies ? As static evaluation functions, we would want them to give the right ordering of move values, with differences as wide as possible. More precisely, we would want that among the best moves, that's not important if the evaluation is not precise for bad moves. Now we build a tree while computing the evaluation function. So that we can allow for false good moves if they are quickly seen as such in the tree, that is after a one or three plies search, if the false good moves are not too numerous. False wrong moves is much worse, since we might never exploit the branch long enough to correct this feeling. The latter paragraph applies also to pre-ordering (I keep a model like the one of Crazystone in mind, with a priori probabilities for each move in the tree, and also a distribution in the playouts). Conclusions: It's no matter if there is a bias in the playout policy, as long as it's the same for all moves. Bias toward solid move is therefore a nuisance... Playing (not too numerous) nonsensical moves would only be a nuisance if there are positions whose associated playouts call for many more urgent moves than in others. What matters is making two moves having very different values. Here the reduction of noise comes into play: if there is 50% chance in all playouts that an alive group dies, and decides the game, then the difference in evaluation of the other positions is divided by two... Taking out all the common noise (that is the mistakes that appear in all playouts) makes the distinction easier. On the other hand, concentrating along a wrong evaluation (after this particular move, the group is dead) would be a catastropha. If this comes from one particular move, it should be noticed and played/avoided systematically. About learning on the fly: I agree completely, that was one of my first posts. However I really think we should have learnt patterns (and other properties) first: you cannot learn the whole of go, or even your game, in one game. And learning is a good thing, but you must find the good move first, and should as quick as possible. For one thing, if we learn patterns, here they should obviously be localized (not translation invariant). We could also learn short move sequences. In a setting with probability distribution on moves, taking that into account is merely changing the probability distribution. Question is: how ? By the way, my guess is that learning on the fly would be more important in the playouts than in the tree: it would contribute to stabilizing the playouts. The tree should anyhow end up with the good moves. This learning should probably also come from the playouts (very much information, and we could stay with information already calculated for the playouts, allowing easy re-use), automatically building a status for groups that are solved only near the end... Jonas, who always reread thinking ``How do I manage to be that unclear !" _______________________________________________ computer-go mailing list [email protected] http://www.computer-go.org/mailman/listinfo/computer-go/
