Re: [computer-go] New UCT-RAVE formula (was Re: computer-go Digest, Vol 43, Issue 8)

Erik van der Werf Sat, 09 Feb 2008 03:45:00 -0800

Hi David,

On Fri, Feb 8, 2008 at 6:09 PM, David Silver <[EMAIL PROTECTED]> wrote:
>  > Note as well that the current implementation of MoGo (not the one at
>  > the time of the ICML paper) use a different tradeoff between UCT and
>  > Rave value, thanks to an idea of David Silver, which brought
>  > improvements in 19x19 (where the Rave values are the most useful),
>  > while it was marginal (still better) in 9x9. But anyway we here are
>  > talking about 9x9, so it can't explain what you are talking about.
>  >
>
>  I think it is time to share this idea with the world :-)
>  The idea is to estimate bias and variance to calculate the best
>  combination of UCT and RAVE values.
>  I have attached a pdf explaining the new formula.


Thanks!


>  >> (2) (....) Depending on the playout
>  >> policy, adding an upper confidence bound to the rave values can push
>  >> some terrible bad moves up (like playing on 1-1). The reason seems to
>  >> be that such moves are normally sampled very infrequently (so the UCB
>  >> will be higher), and when they are selected (...)

Sylvain snipped the part where I explained why the values will also be
higher, so you may have missed that. Essentially what can happen is
that the bad move is always rejected unless it serves a clear purpuse,
and this purpose may then correlate strongly with winning the game
(e.g., big capture). Of course this all depends on knowledge in the
playout policy. IIRC Mogo's policy is mostly random when the local
patterns don't apply, so I guess the only hard-reject pattern that may
cause problems with the rave-values in Mogo are eye-filling moves that
win the game.


>  > That could be an explanation, but there are two points:
>  > - the prior you put on top of Rave often avoid to first sample 1-1,
>  > and even when you do, you very often loose just 1 playout because of
>  > the UCT value you get right away.
>  > - I never observed a big discrepancy between the number of Rave
>  > samples for each move.
>
>  Also, the upper confidence bound reduces rapidly with RAVE, because so
>  many moves are played in each playout.

Yes, but why add upper confidence bounds to the rave values at all? If
they really go down that fast, does it make much of a difference?

I did of course test this and my program actually became weaker when I
added UCBs to the rave values!

The way I originally understood it the purpose of rave was to play
stronger when the number of simulations is low. This can already be
achieved by adding a greedy component to emphasize moves that are
likely to be winning based on their rave-value alone. The purpose of
UCB in UCT is clear (to ensure sufficient exploration when the tree
grows large).  UCB in rave does not really do the same. I think its
two main effects are: (1) it emphasizes the inverse order in which the
moves where added, and (2) it emphasizes exploration of moves that are
infrequently selected in the playouts. I think neither of them is a
good thing. The better the move ordering or the knowledge in the
playouts, the more it hurts...


> So even without prior
>  knowledge, moves like the 1-1 point should be observed less when using
>  RAVE, because they will quickly become associated with losing games.

As explained before, this may depend on how you do the playouts.

Erik
_______________________________________________
computer-go mailing list
[email protected]
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] New UCT-RAVE formula (was Re: computer-go Digest, Vol 43, Issue 8)

Reply via email to