We had some moderate success with a dynamic playout policy in dimwit.
We had a real number associated with each (side, point, 3x3 pattern).
At the end of a playout we looked at the difference between the score
and the sum of these numbers, and we changed the numbers slightly to
try to make the difference smaller (sort of a simple gradient
descent). In the playout we used this information to learn forced
responses: We first picked a random move, but if there was a neighbor
of the last move that had an associated number 5 points higher than
the random move's number, we would play the neighbor instead. This was
a huge improvement over light playouts, but I don't know if this type
of idea would have worked well over some heavier playouts.

Álvaro.


On Wed, May 25, 2011 at 11:16 AM, Stefan Kaitschick
<[email protected]> wrote:
>
>> I suppose that is called "Adaptive Playout".
>> Hendrik Baier reported LGRF heuristics and other lots of failed methods.
>>
>> www.ke.tu-darmstadt.de/lehre/arbeiten/master/2010/Baier_Hendrik.pdf
>>
>> --
>> Yamato
>
> Thanks for the link.
>
> The author comes to a slightly different conclusion though:
>
> "In summary, it can be stated that the results of using move
> replies in dynamic playout policies are encouraging and
> justify further research."
>
>
> But it does seem that it's a stony field to plow.(pardon the pun)
>
> Stefan
> _______________________________________________
> Computer-go mailing list
> [email protected]
> http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
>
_______________________________________________
Computer-go mailing list
[email protected]
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go

Reply via email to