> >> 4) regularized success rate (nbWins +K ) /(nbSims + 2K) >> (the original "progressive bias" is simpler than that) >> > > I'm not sure what you mean here. Can you explain a bit more? > > Sorry for being unclear, I hope I'll do better below.
Instead of just "number of wins" divided by "numer of simulations", we use "nb of wins + K" divided by "nb of simulations + 2K"; this is similar to the "even game" heuristic previously cited; it avoids that we 0% of success rate for a move tested just once. If you apply UCT with constant zero in front of the "sqrt{log(N)/N_i)" term, then such a regularization is necessary for showing consistency of UCT for two-player games; and even with non-zero "exploration terms", I guess this kind of regularization avoids that the program spends a very long time without looking at a move just because of a few bad first simulations. This kind of detail is a bit boring, but I think K>0 is much better in many cases... well, maybe not for other implementations, depending on the other terms you have - our formula is so long now I'm not able of writing it in closed form :-) By the way, K>0 is in my humble opinion a very good idea if you want to check that UCT with positive constant has a good effect in your code - I feel that UCT is great if K=0, just because of the "bad first simulation effect" - with K=0 and without exploration term, just loosing the first few simulations can lead to the very bad situation in which a move is never tested anymore. Best regards, Olivier
_______________________________________________ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/