> As far as I see,
> if RAVE gives constant value 0 to one move, it will never be tested if
> other moves
> have non-zero AMAF values.
>
> A move
> with "real" empirical probability 0 of winning and AMAF value of 0.01
> will always be preferred to a non-simulated move with AMAF 0.0, whatever
> may be
> the number of simulations.
I agree, it is why I added a statement about the prior, which implies
that the AMAF value is never 0.0 but at worst decreases like 1/m if m
is the number of AMAF updates for that move.

Thinking a little more about it, I think we have to add an hypothesis
which is that, for a given move, the number of AMAF updates if < alpha
(nb total UCT updates), with alpha < 1. That seems to hold for most of
the updates (with alpha close to 0.5), but there may be cases where it
does not hold.
Maybe I am confused and say unsound things, sorry for that. It is the
kind of things we should discuss in front of a black (or white) board.

Sylvain
_______________________________________________
computer-go mailing list
[email protected]
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to