>
> Well, empirically, when I set the exploration component to zero it starts
> to play a lot worse. Like I wrote: the winning percentage drops to 24% vs.
> the same program with the exploration component, which is a huge difference.
>
> So if you have a different experience, you must have something else that
> overcomes this hurdle that's not part of a simple MCTS-RAVE implementation.
> I'd be very interested to learn what that is. Sylvain didn't take the bait
> ;-)
>

Here, we have a non-zero initialization of the number of wins, of the
numbere of simulations, of the number of Rave-wins, of the number of
Rave-losses.
We have then a 0 constant for exploration, but also an exploratory term
which is very different, and for which I am not the main author - therefore
I let the main author
give an explanation if he wants to :-)

I point out that even before this exploratory term, the best UCB-like
exploration-constant was 0 - as soon as the initializations of numbers of
wins, of losses, of Rave-wins, of Rave-losses are heuristic values.
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to