Hi Oliver,
unfortunately it's not easy to understand the effects of playout policy
modifications on the behavior of MCTS. Some variations indeed refresh so
rarely that their suggestions are pretty old most of the time. However,
old doesn't necessarily mean outdated, as some types of information
appear to be quite valuable throughout the search tree. It's hard to
predict how valuable and how generalizable a given type of information
is going to be on average.
An example from our paper: Move replies to single moves (LGRF-1) are
used in 27.1% of moves. They stay in the reply table for on average only
4.2 playouts - they are overwritten or deleted constantly. Move replies
to pairs of moves (LGRF-2) however are applied just as often (27.7%),
but remain in the table for on average 1700 playouts! Yet they still
provide very useful information, as we can see from the playing strength.
Hendrik
Hendrik - did you look at any metrics on the variations to see if you could establish
why most of them were not successful? I was wondering if looking at the percentage of
suggestions made by the policy or the refresh rate would suggest what the problem is with
some of the others. For example, a policy which is providing a "good reply"
nearly 100% of the time with a low refresh rate is probably too narrow for good
exploration.
Oliver
_______________________________________________
Computer-go mailing list
[email protected]
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go