I recently tried to get ELO features, as describe in the first paper you link, to work in my playouts to no avail (see: "Oakfoam and ELO Features"). This has not changed since, however, I then decided to try "progressive widening" as described in Coulom's paper. It uses the ELO feature strengths to order the moves and then unprunes moves according to the number of playouts.
I managed to get a large strength increase by doing progressive widening/unpruning. In the order of 300 ELO on 9x9 against GnuGo. I then retrained the features for 19x19, which gave me a hit on 9x9, but should do much better on 19x19. Some of the parameters seem to be quite delicate and a small change can give quiet a change in strength. It seems that my program usually finds a good line of thinking, but sometimes the best move is pruned and doesn't not get looked at sufficiently. It's difficult to say if this will hamper my program in the future or not, but for now it works well. -- Francois van Niekerk Email: [email protected] | Twitter: @francoisvn Cell: +2784 0350 214 | Website: http://leafcloud.com On Thu, Jan 13, 2011 at 8:59 PM, Olivier Teytaud <[email protected]> wrote: > Dear all, > > there are papers using a bandit formula only for the n^alpha "first" > moves, for a given order of moves (Coulom, Chaslot et al, Couetoux et al): > Coulom: http://hal.inria.fr/inria-00149859_v1/ > Chaslot et al: > http://www.personeel.unimaas.nl/g-chaslot/papers/progressivemcts.pdf > Couetoux et al: http://hal.archives-ouvertes.fr/hal-00542673/en/ > > This is known efficient in some cases whenever we use a random ordering of > unvisited moves (no RAVE values, no heuristic value), in particular with > infinitely many legal moves (roughly speaking, it's better to study 100 > moves with 100 simulations each than to sample 10000 moves with only 1 > simulation each). > (the paper by Wang, Audibert & Munos is good around that > http://certis.enpc.fr/~audibert/Mes%20articles/NIPS08.pdf). > > In Go we don't use this, but for some other problems we do, in particular > with no ranking of unvisited moves. Our constant > "alpha" is usually 1/2, in this context. > > Anyone wanting to share his experience around that (with or without > heuristic ranking of legal moves) ? > > Best regards, > Olivier > > > > > > > _______________________________________________ > Computer-go mailing list > [email protected] > http://dvandva.org/cgi-bin/mailman/listinfo/computer-go > _______________________________________________ Computer-go mailing list [email protected] http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
