Hi David, On Fri, Feb 8, 2008 at 6:09 PM, David Silver <[EMAIL PROTECTED]> wrote: > > Note as well that the current implementation of MoGo (not the one at > > the time of the ICML paper) use a different tradeoff between UCT and > > Rave value, thanks to an idea of David Silver, which brought > > improvements in 19x19 (where the Rave values are the most useful), > > while it was marginal (still better) in 9x9. But anyway we here are > > talking about 9x9, so it can't explain what you are talking about. > > > > I think it is time to share this idea with the world :-) > The idea is to estimate bias and variance to calculate the best > combination of UCT and RAVE values. > I have attached a pdf explaining the new formula.
Thanks! > >> (2) (....) Depending on the playout > >> policy, adding an upper confidence bound to the rave values can push > >> some terrible bad moves up (like playing on 1-1). The reason seems to > >> be that such moves are normally sampled very infrequently (so the UCB > >> will be higher), and when they are selected (...) Sylvain snipped the part where I explained why the values will also be higher, so you may have missed that. Essentially what can happen is that the bad move is always rejected unless it serves a clear purpuse, and this purpose may then correlate strongly with winning the game (e.g., big capture). Of course this all depends on knowledge in the playout policy. IIRC Mogo's policy is mostly random when the local patterns don't apply, so I guess the only hard-reject pattern that may cause problems with the rave-values in Mogo are eye-filling moves that win the game. > > That could be an explanation, but there are two points: > > - the prior you put on top of Rave often avoid to first sample 1-1, > > and even when you do, you very often loose just 1 playout because of > > the UCT value you get right away. > > - I never observed a big discrepancy between the number of Rave > > samples for each move. > > Also, the upper confidence bound reduces rapidly with RAVE, because so > many moves are played in each playout. Yes, but why add upper confidence bounds to the rave values at all? If they really go down that fast, does it make much of a difference? I did of course test this and my program actually became weaker when I added UCBs to the rave values! The way I originally understood it the purpose of rave was to play stronger when the number of simulations is low. This can already be achieved by adding a greedy component to emphasize moves that are likely to be winning based on their rave-value alone. The purpose of UCB in UCT is clear (to ensure sufficient exploration when the tree grows large). UCB in rave does not really do the same. I think its two main effects are: (1) it emphasizes the inverse order in which the moves where added, and (2) it emphasizes exploration of moves that are infrequently selected in the playouts. I think neither of them is a good thing. The better the move ordering or the knowledge in the playouts, the more it hurts... > So even without prior > knowledge, moves like the 1-1 point should be observed less when using > RAVE, because they will quickly become associated with losing games. As explained before, this may depend on how you do the playouts. Erik _______________________________________________ computer-go mailing list [email protected] http://www.computer-go.org/mailman/listinfo/computer-go/
