> With uniformly distributed playouts, it would be something around
> c=0.2 in sqrt(c*ln(N)/M), with much more sophisticated heuristics and
> good prior biasing of the node values and then RAVE, c will approach 0
> as the need for UCB-driven exploration will decrease.


Thank you.

But exploration coefficient C can't be equal to 0 ? Because if it's equal, then 
we return to the situation which first post of this thread described (we use 
only WinRate).


Another question: what to do when the game is over in the Tree Policy, not in 
the Default Policy? Do we have to make the program not to select this node any 
more (not to call procedure PlaySimulation for this node)?
_______________________________________________
Computer-go mailing list
[email protected]
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go

Reply via email to