Hi Oliver,

unfortunately it's not easy to understand the effects of playout policy modifications on the behavior of MCTS. Some variations indeed refresh so rarely that their suggestions are pretty old most of the time. However, old doesn't necessarily mean outdated, as some types of information appear to be quite valuable throughout the search tree. It's hard to predict how valuable and how generalizable a given type of information is going to be on average.

An example from our paper: Move replies to single moves (LGRF-1) are used in 27.1% of moves. They stay in the reply table for on average only 4.2 playouts - they are overwritten or deleted constantly. Move replies to pairs of moves (LGRF-2) however are applied just as often (27.7%), but remain in the table for on average 1700 playouts! Yet they still provide very useful information, as we can see from the playing strength.

Hendrik

   Hendrik - did you look at any metrics on the variations to see if you could establish 
why most of them were not successful?  I was wondering if looking at the percentage of 
suggestions made by the policy or the refresh rate would suggest what the problem is with 
some of the others.  For example, a policy which is providing a "good reply" 
nearly 100% of the time with a low refresh rate is probably too narrow for good 
exploration.

   Oliver


_______________________________________________
Computer-go mailing list
[email protected]
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go

Reply via email to