>
>> 4) regularized success rate (nbWins +K ) /(nbSims + 2K)
>> (the original "progressive bias" is simpler than that)
>>
>
> I'm not sure what you mean here. Can you explain a bit more?
>
>
Sorry for being unclear, I hope I'll do better below.

Instead of just "number of wins" divided by "numer of simulations",
we use "nb of wins + K" divided by "nb of simulations + 2K";
this is similar to the "even game" heuristic previously cited;
it avoids that we 0% of success rate for a move tested just once.

If you apply UCT with constant zero in front of the "sqrt{log(N)/N_i)"
term, then such a regularization is necessary for showing consistency of UCT
for two-player games; and even with non-zero "exploration terms", I guess
this kind of regularization avoids that the program spends a very long time
without looking at a move just because of a few bad first simulations. This
kind of detail is a bit boring, but I think K>0 is much better in many
cases... well, maybe not for other implementations, depending on the other
terms you have - our formula is so long now I'm not able of writing it in
closed form :-)
By the way, K>0 is in my humble opinion a very good idea if you want to
check that UCT with positive constant has a good effect in your code - I
feel that UCT is great if K=0, just because of the "bad first simulation
effect" - with K=0 and without exploration term, just loosing the first few
simulations can lead to the very bad situation in which a move is never
tested anymore.

Best regards,
Olivier
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to