> With uniformly distributed playouts, it would be something around > c=0.2 in sqrt(c*ln(N)/M), with much more sophisticated heuristics and > good prior biasing of the node values and then RAVE, c will approach 0 > as the need for UCB-driven exploration will decrease.
Thank you. But exploration coefficient C can't be equal to 0 ? Because if it's equal, then we return to the situation which first post of this thread described (we use only WinRate). Another question: what to do when the game is over in the Tree Policy, not in the Default Policy? Do we have to make the program not to select this node any more (not to call procedure PlaySimulation for this node)? _______________________________________________ Computer-go mailing list [email protected] http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
