Hi,

as a result of my oakfoam scaling tests I had a look at our progressive
bias impementation.

I recognized, that the playing strength is quite sensitive to the exact
way of progressive bias. I looked into pachi and  the "Progressive
Strategies for Monte-Carlo Tree Search" paper.

I could not find a mathematical reason for the ways used.

Pachi has an implementation which was justified by effective
implementation (if I understood correctly) and 

"Progressive Strategies for Monte-Carlo Tree Search"

uses a additative term: H_B/n_i with H_B representing heuristic
knowledge and n_i are the playouts of the node.

On the one hand I wondered that using playouts of the node (and not
playouts of the parent) interferes with the UCT term sqrt(log(N)/n_i),
which lead me to change this. And I do not see a mathematical reason for
scaling with 1/N, why not 1/N^2 or something like exp(-c*N)??

On the other hand H_B is by no way specified. One may tend to use gammas
(from "Computing Elo Ratings of Move Patterns in the Game of Go"), but
as gammas are products I thought it might be more correct to use their
log as an additive term?!

so my actual progressive term is

log(gamma)/N, 

with gamma from the ELO paper and N being the playouts of the parent
node (I talk about 80ELO improvenment over the term (gamma/n_i) tested
on 9x9 with 5000 playouts/move against pachi) 

But I would feel better with mathematical arguments for using 1/N and
log(gamma)


Any hints would be very great:)

Detlef

_______________________________________________
Computer-go mailing list
Computer-go@dvandva.org
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go

Reply via email to