Why are m and n different? Isn't every playout used both to update the UCT win rate and the RAVE values for the same nodes? Won't the number of UCT simulations and the number of RAVE simulations be the same?
Davdi From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of David Silver Sent: Friday, February 08, 2008 3:40 PM To: [email protected] Subject: [computer-go] New UCT-RAVE formula (was Re: computer-go Digest, Vol 43, Issue 8) Hi Jason, The original paper's formula for beta always felt wrong to me. I like this new one a lot better. Good! Me too :-) It makes lots of assumptions that are not true in practice, but at least it is based on a sound principle! Is it correct that the pdf assumes a uct bias of zero? You could be asking one of two things, so I will try and answer both :-) 1. Yes, you are correct that we are assuming a uct bias of zero. 2. No, the assumption itself is not correct. The true value of a node in the tree is 0 or 1, given perfect play. So the UCT value (which just averages the outcomes of simulations) is significantly biased. Calculation of the MSE seems to assume this going into the last step but doesn't simplify life by doing it in the first reduction...
_______________________________________________ computer-go mailing list [email protected] http://www.computer-go.org/mailman/listinfo/computer-go/
