Re: [computer-go] New UCT-RAVE formula (was Re: computer-go Digest, Vol 43, Issue 8)

Erik van der Werf Mon, 18 Feb 2008 04:58:29 -0800

Hi David,

On Sat, Feb 16, 2008 at 7:07 PM, David Silver <[EMAIL PROTECTED]> wrote:


> Yes, but why add upper confidence bounds to the rave values at all? If
> they really go down that fast, does it make much of a difference?
>
> According to the recent experiments in MoGo, you are right :-) However, I've 
> seen slightly different results in other strong UCT-RAVE programs.
>
> I think you are right that the exploration interacts heavily with the playout 
> strategy. If the playouts heavily favour a particular response (e.g. black 
> always connects at d3), then using an UCB on RAVE is one way to ensure that 
> the opposite result is also tried (e.g. what happens if white cuts at d3?).
>
> I did of course test this and my program actually became weaker when I
> added UCBs to the rave values!
>
> Interesting! Did you use prior knowledge in your RAVE values? (See below)
>
> No, this was done without using prior knowledge to intialize the rave
values.

The way I originally understood it the purpose of rave was to play stronger
> when the number of simulations is low. This can already be achieved by
> adding a greedy component to emphasize moves that are likely to be winning
> based on their rave-value alone.
>
> Agreed, this is certainly the most important aspect of RAVE.
>
> The purpose of UCB in UCT is clear (to ensure sufficient exploration when
> the tree grows large). UCB in rave does not really do the same.
>
> Agreed.
>
> I think its two main effects are: (1) it emphasizes the inverse order in
> which the moves where added,
>
> Not sure what you mean by this.
>
> I'll try to explain:

In my implementation I only have a rave estimate for a particular move in a
particular position if this move has been sampled at least once (so the
corresponding child node has to be present in the tree). New moves (and
corresponding child nodes) are added one at a time (or with tricks like FPU
even slower) until all legal moves are present. The order in which
(unexplored) legal moves are added to the tree depends on move ordering
heuristics (promising moves should be added first). A consequence of this is
that the rave-visit counters for moves added early generally receive a lot
more hits than the moves that are added later. The invsqrt factor used in
computing the upper confidence bounds then causes the moves added last (with
low rave-visit counts) to receive a high upper confidence bound, hence
emphasizing the inverse order in which the moves were added. With a good
move ordering the moves added last are nearly always bad, so emphasizing
them reduces playing strength.


> and (2) it emphasizes exploration of moves that are infrequently selected
> in the playouts.
>
> Agreed.
>
> I think neither of them is a good thing. The better the move ordering or
> the knowledge in the playouts, the more it hurts...
>
> Not if you also encode your playout knowledge as prior knowledge for the RAVE 
> algorithm. This way, you can specify your confidence in the choices made by 
> the playouts. The UCB is always relative to this confidence level.
>
> I think the prior knowledge should improve playing strength due to an
improved rave-value estimate. However, w.r.t. the UCB's, the larger initial
counts associated with the prior knowledge simply bring the added upper
confidence bounds closer to zero, which is done even more effectively by not
adding them at all.

Best,
Erik

_______________________________________________
computer-go mailing list
[email protected]
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] New UCT-RAVE formula (was Re: computer-go Digest, Vol 43, Issue 8)

Reply via email to