I use the first formula, with K equal 500.

> -----Original Message-----
> From: computer-go-boun...@computer-go.org [mailto:computer-go-
> boun...@computer-go.org] On Behalf Of Peter Drake
> Sent: Monday, September 14, 2009 10:59 AM
> To: Computer Go
> Subject: [computer-go] Conflicting RAVE formulae
> 
> Gelly and Silver ("Combining Online and Offline Knowledge in UCT",
> section 6) give this formula for the weight given to RAVE values (as
> opposed to the direct MC values):
> 
> sqrt(k / (3*n(s) + k))
> 
> Here, k is a constant and n(s) is the number of playouts through state
> s. Clearly, as the number of playouts increases, this approaches zero.
> 
> Hembold and Parker-Wood ("All-Moves-As-First Heuristics in Monte-Carlo
> Go") site the Gelly and Silver paper, but give a different formula!
> Adjusting for notation, they use:
> 
> (k - n(s)) / k, or 0 if this expression is negative
> 
> This also converges toward (and then sticks at) zero, but it it not
> the same formula.
> 
> Why are they different? Does it matter? Is there an explanation
> anywhere for Gelly and Silver's more elaborate formula? Is there
> anything wrong with k / (n(s) + k)?
> 
> On  a related note, in a message on this list, David Silver gives a
> newer formula:
> 
> http://computer-go.org/pipermail/computer-go/2009-May/018251.html
> 
> Was this ever published? (Orego is using this newer formula, and it
> appears to work well.)
> 
> Peter Drake
> http://www.lclark.edu/~drake/
> 
> 
> 
> _______________________________________________
> computer-go mailing list
> computer-go@computer-go.org
> http://www.computer-go.org/mailman/listinfo/computer-go/

_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to