We had some moderate success with a dynamic playout policy in dimwit. We had a real number associated with each (side, point, 3x3 pattern). At the end of a playout we looked at the difference between the score and the sum of these numbers, and we changed the numbers slightly to try to make the difference smaller (sort of a simple gradient descent). In the playout we used this information to learn forced responses: We first picked a random move, but if there was a neighbor of the last move that had an associated number 5 points higher than the random move's number, we would play the neighbor instead. This was a huge improvement over light playouts, but I don't know if this type of idea would have worked well over some heavier playouts.
Álvaro. On Wed, May 25, 2011 at 11:16 AM, Stefan Kaitschick <[email protected]> wrote: > >> I suppose that is called "Adaptive Playout". >> Hendrik Baier reported LGRF heuristics and other lots of failed methods. >> >> www.ke.tu-darmstadt.de/lehre/arbeiten/master/2010/Baier_Hendrik.pdf >> >> -- >> Yamato > > Thanks for the link. > > The author comes to a slightly different conclusion though: > > "In summary, it can be stated that the results of using move > replies in dynamic playout policies are encouraging and > justify further research." > > > But it does seem that it's a stony field to plow.(pardon the pun) > > Stefan > _______________________________________________ > Computer-go mailing list > [email protected] > http://dvandva.org/cgi-bin/mailman/listinfo/computer-go > _______________________________________________ Computer-go mailing list [email protected] http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
