I don't think "only uniformly random playouts will scale to
perfection" because what we need for playouts is not just a simple
average of final scores but a maximum (in negmax sense) score. It
should be the perfect evaluation function.
In other words, as MC simulation is a way to get an average of a
value, when applying it to optimization problems we need some way to
focus the simulations to the _peak_ in a state space.
It may be obvious when one consideres L&D problems where the best move
that leads to the maximum score (live) is only one and all other moves
are bad. At such positions it's almost no sense to simulate all legal
moves with same probability. So, IMHO, biasing simulations is not
just a speed-up technique but is essentially important.
I agree, but what I meant about uniformly random playouts is the following:
What makes a move outstanding is being unpredictable. For a total novice,
playing at the key point of a bulky five may look like a touch of genius,
but when you learn a little, its an obvious move. The difference between a
5p and a 9p may be one or two moves nobody can predict (except a 9p). When
we add knowledge we find the _ordinary_ good moves faster, we make weaker
moves less probable, but that comes at a price, the price of making outstanding
unpredictable moves less probable also. Perhaps that introduces a ceiling.
I thought that was what you were also pointing. Of course, I don't claim
uniformly random playouts are good, I just claim that they should (just as an
infeasible theoretic argument) scale to perfection, of course that scaling
doesn't have to be linear.
Jacques.
_______________________________________________
computer-go mailing list
[email protected]
http://www.computer-go.org/mailman/listinfo/computer-go/