Hi Erik,
Thanks for the thought-provoking response!
Yes, but why add upper confidence bounds to the rave values at all? If
they really go down that fast, does it make much of a difference?
According to the recent experiments in MoGo, you are right :-)
However, I've seen slightly different results in other strong UCT-RAVE
programs.
I think you are right that the exploration interacts heavily with the
playout strategy. If the playouts heavily favour a particular response
(e.g. black always connects at d3), then using an UCB on RAVE is one
way to ensure that the opposite result is also tried (e.g. what
happens if white cuts at d3?).
I did of course test this and my program actually became weaker when I
added UCBs to the rave values!
Interesting! Did you use prior knowledge in your RAVE values? (See
below)
The way I originally understood it the purpose of rave was to play
stronger when the number of simulations is low. This can already be
achieved by adding a greedy component to emphasize moves that are
likely to be winning based on their rave-value alone.
Agreed, this is certainly the most important aspect of RAVE.
The purpose of UCB in UCT is clear (to ensure sufficient exploration
when the tree
grows large). UCB in rave does not really do the same.
Agreed.
I think its
two main effects are: (1) it emphasizes the inverse order in which the
moves where added,
Not sure what you mean by this.
and (2) it emphasizes exploration of moves that are
infrequently selected in the playouts.
Agreed.
I think neither of them is a
good thing. The better the move ordering or the knowledge in the
playouts, the more it hurts...
Not if you also encode your playout knowledge as prior knowledge for
the RAVE algorithm. This way, you can specify your confidence in the
choices made by the playouts. The UCB is always relative to this
confidence level.
Having said that, there may already be sufficient exploration without
the UCB bonuses. Perhaps all we can say is that RAVE, prior knowledge
and exploration all interact heavily, and that the best level of
exploration depends on the exact choices made.
-Dave
_______________________________________________
computer-go mailing list
[email protected]
http://www.computer-go.org/mailman/listinfo/computer-go/