[computer-go] Conflicting RAVE formulae

Peter Drake Mon, 14 Sep 2009 10:59:22 -0700

Gelly and Silver ("Combining Online and Offline Knowledge in UCT",section 6) give this formula for the weight given to RAVE values (asopposed to the direct MC values):


sqrt(k / (3*n(s) + k))

Here, k is a constant and n(s) is the number of playouts through states. Clearly, as the number of playouts increases, this approaches zero.

Hembold and Parker-Wood ("All-Moves-As-First Heuristics in Monte-CarloGo") site the Gelly and Silver paper, but give a different formula!Adjusting for notation, they use:


(k - n(s)) / k, or 0 if this expression is negative

This also converges toward (and then sticks at) zero, but it it notthe same formula.

Why are they different? Does it matter? Is there an explanationanywhere for Gelly and Silver's more elaborate formula? Is thereanything wrong with k / (n(s) + k)?

On a related note, in a message on this list, David Silver gives anewer formula:


http://computer-go.org/pipermail/computer-go/2009-May/018251.html

Was this ever published? (Orego is using this newer formula, and itappears to work well.)


Peter Drake
http://www.lclark.edu/~drake/



_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

[computer-go] Conflicting RAVE formulae

Reply via email to