> As far as I see, > if RAVE gives constant value 0 to one move, it will never be tested if > other moves > have non-zero AMAF values. > > A move > with "real" empirical probability 0 of winning and AMAF value of 0.01 > will always be preferred to a non-simulated move with AMAF 0.0, whatever > may be > the number of simulations. I agree, it is why I added a statement about the prior, which implies that the AMAF value is never 0.0 but at worst decreases like 1/m if m is the number of AMAF updates for that move.
Thinking a little more about it, I think we have to add an hypothesis which is that, for a given move, the number of AMAF updates if < alpha (nb total UCT updates), with alpha < 1. That seems to hold for most of the updates (with alpha close to 0.5), but there may be cases where it does not hold. Maybe I am confused and say unsound things, sorry for that. It is the kind of things we should discuss in front of a black (or white) board. Sylvain _______________________________________________ computer-go mailing list [email protected] http://www.computer-go.org/mailman/listinfo/computer-go/
