Hi, as a result of my oakfoam scaling tests I had a look at our progressive bias impementation.
I recognized, that the playing strength is quite sensitive to the exact way of progressive bias. I looked into pachi and the "Progressive Strategies for Monte-Carlo Tree Search" paper. I could not find a mathematical reason for the ways used. Pachi has an implementation which was justified by effective implementation (if I understood correctly) and "Progressive Strategies for Monte-Carlo Tree Search" uses a additative term: H_B/n_i with H_B representing heuristic knowledge and n_i are the playouts of the node. On the one hand I wondered that using playouts of the node (and not playouts of the parent) interferes with the UCT term sqrt(log(N)/n_i), which lead me to change this. And I do not see a mathematical reason for scaling with 1/N, why not 1/N^2 or something like exp(-c*N)?? On the other hand H_B is by no way specified. One may tend to use gammas (from "Computing Elo Ratings of Move Patterns in the Game of Go"), but as gammas are products I thought it might be more correct to use their log as an additive term?! so my actual progressive term is log(gamma)/N, with gamma from the ELO paper and N being the playouts of the parent node (I talk about 80ELO improvenment over the term (gamma/n_i) tested on 9x9 with 5000 playouts/move against pachi) But I would feel better with mathematical arguments for using 1/N and log(gamma) Any hints would be very great:) Detlef _______________________________________________ Computer-go mailing list Computer-go@dvandva.org http://dvandva.org/cgi-bin/mailman/listinfo/computer-go