I was surprised many MC programs are not UCT anymore. > UCB = (wins / games) + C*sqrt( log(all_games) / games ) > But in MFG, CS, Pachi and Fuego, C = 0. So they use something like this. > UCB_RAVE = (1-beta)*(wins / games) + beta*(rave_wins / rave_games) + > somebias. >
I think that in many UCTs, the C was so small that it was close to the case C=0. In fact, wins/games is not asymptotically consistent (because a move with 0/1 is discarded if another move has a score >0). But "(wins+K)/(games+2K)" for any K>0 makes a MCTS consistent. We've worked on this in http://hal.inria.fr/inria-00437146/ . Olivier
_______________________________________________ Computer-go mailing list [email protected] http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
