Dear all, there are papers using a bandit formula only for the n^alpha "first" moves, for a given order of moves (Coulom, Chaslot et al, Couetoux et al): Coulom: http://hal.inria.fr/inria-00149859_v1/ Chaslot et al: http://www.personeel.unimaas.nl/g-chaslot/papers/progressivemcts.pdf Couetoux et al: http://hal.archives-ouvertes.fr/hal-00542673/en/
This is known efficient in some cases whenever we use a random ordering of unvisited moves (no RAVE values, no heuristic value), in particular with infinitely many legal moves (roughly speaking, it's better to study 100 moves with 100 simulations each than to sample 10000 moves with only 1 simulation each). (the paper by Wang, Audibert & Munos is good around that http://certis.enpc.fr/~audibert/Mes%20articles/NIPS08.pdf). In Go we don't use this, but for some other problems we do, in particular with no ranking of unvisited moves. Our constant "alpha" is usually 1/2, in this context. Anyone wanting to share his experience around that (with or without heuristic ranking of legal moves) ? Best regards, Olivier
_______________________________________________ Computer-go mailing list [email protected] http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
