Dear all,

   there are papers using a bandit formula only for the n^alpha "first"
moves, for a given order of moves (Coulom, Chaslot et al, Couetoux et al):
Coulom: http://hal.inria.fr/inria-00149859_v1/
Chaslot et al:
http://www.personeel.unimaas.nl/g-chaslot/papers/progressivemcts.pdf
Couetoux et al: http://hal.archives-ouvertes.fr/hal-00542673/en/

This is known efficient in some cases whenever we use a random ordering of
unvisited moves (no RAVE values, no heuristic value), in particular with
infinitely many legal moves (roughly speaking, it's better to study 100
moves with 100 simulations each than to sample 10000 moves with only 1
simulation each).
(the paper by Wang, Audibert & Munos is good around that
http://certis.enpc.fr/~audibert/Mes%20articles/NIPS08.pdf).

In Go we don't use this, but for some other problems we do, in particular
with no ranking of unvisited moves. Our constant
"alpha" is usually 1/2, in this context.

Anyone wanting to share his experience around that (with or without
heuristic ranking of legal moves) ?

Best regards,
Olivier
_______________________________________________
Computer-go mailing list
[email protected]
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go

Reply via email to