Gelly and Silver ("Combining Online and Offline Knowledge in UCT", section 6) give this formula for the weight given to RAVE values (as opposed to the direct MC values):

sqrt(k / (3*n(s) + k))

Here, k is a constant and n(s) is the number of playouts through state s. Clearly, as the number of playouts increases, this approaches zero.

Hembold and Parker-Wood ("All-Moves-As-First Heuristics in Monte-Carlo Go") site the Gelly and Silver paper, but give a different formula! Adjusting for notation, they use:

(k - n(s)) / k, or 0 if this expression is negative

This also converges toward (and then sticks at) zero, but it it not the same formula.

Why are they different? Does it matter? Is there an explanation anywhere for Gelly and Silver's more elaborate formula? Is there anything wrong with k / (n(s) + k)?

On a related note, in a message on this list, David Silver gives a newer formula:

http://computer-go.org/pipermail/computer-go/2009-May/018251.html

Was this ever published? (Orego is using this newer formula, and it appears to work well.)

Peter Drake
http://www.lclark.edu/~drake/



_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to