> After the program plays all simulations, which move should it choose? > > (Wins/Visits) + SQRT(ln(...)) > > or > > (Wins+Draw/2)/Visits + SQRT(ln(...)) > > None of these two formula :-) These formulas is for choosing moves to be simulated. For turn-based games, when al simulations are finished, we should choose
move = argmax_m number_of_simulations(m) or something like that (you can introduce a bias built from the success rate...). But this is not related to draws :-) For the formula to be used for simulations, I think: (Wins/Visits) + SQRT(ln(...)) should lead to good results asymptotically if the game is a win when playing optimally, (Wins+Draw/2)/Visits + SQRT(ln(...)) should lead to good results asymptotically if the game is a draw when playing optimally. Non-asymptotically, life is too complicated :-) Best regards, Olivier
_______________________________________________ Computer-go mailing list [email protected] http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
