I don't think "only uniformly random playouts will scale to perfection" because what we need for playouts is not just a simple average of final scores but a maximum (in negmax sense) score. It should be the perfect evaluation function.

In other words, as MC simulation is a way to get an average of a value, when applying it to optimization problems we need some way to focus the simulations to the _peak_ in a state space.

It may be obvious when one consideres L&D problems where the best move that leads to the maximum score (live) is only one and all other moves are bad. At such positions it's almost no sense to simulate all legal moves with same probability. So, IMHO, biasing simulations is not just a speed-up technique but is essentially important.

I agree, but what I meant about uniformly random playouts is the following:
What makes a move outstanding is being unpredictable. For a total novice,
playing at the key point of a bulky five may look like a touch of genius,
but when you learn a little, its an obvious move. The difference between a
5p and a 9p may be one or two moves nobody can predict (except a 9p). When we add knowledge we find the _ordinary_ good moves faster, we make weaker moves less probable, but that comes at a price, the price of making outstanding unpredictable moves less probable also. Perhaps that introduces a ceiling. I thought that was what you were also pointing. Of course, I don't claim uniformly random playouts are good, I just claim that they should (just as an infeasible theoretic argument) scale to perfection, of course that scaling doesn't have to be linear.

Jacques.


_______________________________________________
computer-go mailing list
[email protected]
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to