Hi, I'm currently looking to write a bot using UCT/RAVE but there's a bit of confusion on my end as to correct implementation of RAVE. In particular, I'm unsure of how to calculate the variance term for the nodes in the tree search. From reading Gelly and Silver's original paper on RAVE, I believe that the UCB variance term for a node x is calculated as:
sqrt(log(N) / n) Where n is the number of times that x was chosen/updated. N is the summation of n's for x and all siblings of x which is also equal to the n of x_parent. According to the paper (or atleast as I understand it), the AMAF variance term for RAVE is calculated as: sqrt(log(M) / m) Where m is the number of times that x was given an AMAF "virtual" update and M is the sum of m's for x and all of its siblings. However, I've also downloaded the source for the TesujiRef engine from the plug-and-go's svn repository and it seems RAVE is implemented differently here. In particular, I've noticed that the AMAF variance term is calculated as: sqrt(log(N) / m) Where N is the number of real updated for x_parent and m is the number of virtual updates for x (in consistence with the terms used throughout this email). Tesuji's AMAF variance term seems to only go up when x isn't chosen for a real update as opposed to a virtual update. Which implementation is best? Am I simply misunderstanding one of the above? I've also noticed that the beta value used for RAVE is calculated differently. The beta in Gelly and Silver's paper seems to decrease at an inverse sqrt rate where as the beta in TesujiRefBot decreases logarithmically. Is the implementation in Tesuji more "modern" than the once described in Gelly and Silver's paper? Thanks a lot for your help.
_______________________________________________ Computer-go mailing list [email protected] http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
