Why are m and n different?  Isn't every playout used both to update the UCT
win rate and the RAVE values for the same nodes?  Won't the number of UCT
simulations and the number of RAVE simulations be the same?

 

Davdi

 

From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of David Silver
Sent: Friday, February 08, 2008 3:40 PM
To: [email protected]
Subject: [computer-go] New UCT-RAVE formula (was Re: computer-go Digest, Vol
43, Issue 8)

 

Hi Jason,

The original paper's formula for beta always felt wrong to me.  I like this
new one a lot better.

Good! Me too :-)

It makes lots of assumptions that are not true in practice, but at least it
is based on a sound principle!

 Is it correct that the pdf assumes a uct bias of zero?  

You could be asking one of two things, so I will try and answer both :-)

 

1. Yes, you are correct that we are assuming a uct bias of zero.

2. No, the assumption itself is not correct. The true value of a node in the
tree is 0 or 1, given perfect play. So the UCT value (which just averages
the outcomes of simulations) is significantly biased.

Calculation of the
MSE seems to assume this going into the last step but doesn't simplify life
by doing it in the first reduction...

_______________________________________________
computer-go mailing list
[email protected]
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to