[computer-go] New UCT-RAVE formula (was Re: computer-go Digest, Vol 43, Issue 8)

David Silver Sat, 16 Feb 2008 10:31:16 -0800

I am very confused about the new UCT-RAVE formula.
The equation 9 seems to mean:


variance_u = value_ur * (1 - value_ur) / n.

Is it wrong?  If correct, why is it the variance?
I think that the variance of the UCT should be:

variance_u = value_u * (1 - value_u).

Hi Yamato,

There are two differences between your suggestion and the originalformula, so I'll try and address both:

1. Your formula gives the variance of a single simulation, withprobability value_u. But the more simulations you see, the more youreduce the uncertainty, so you must divide by n.

In general, the variance of a single coin-flip (with probability p ofheads) is p(1-p).

The variance of the sum of n coin-flips is np(1-p).

The variance of the average of n coin-flips is p(1-p)/n. This is whatwe want!

2. The variance of the estimate is actually given by: variance_u =true_value_u * (1 - true_value_u) / n, where true_value_u is the realprobability of winning a simulation (for the current policy), if wehad access to an oracle. Unfortunately, we don't - so we use the bestavailable estimate. If we have seen a large number of simulations,then you are right that value_u is the best estimate. But if we haveonly seen a few simulations, then value_r gives a better estimate(this is the point of RAVE!) The whole point of this approach is toform the best possible estimate of true_value_u, by combining thesetwo estimates together. In a way this is somewhat circular: we use thebest estimate so far to compute the best new estimate. But I don'tthink that is unreasonable in this case.


-Dave

_______________________________________________
computer-go mailing list
[email protected]
http://www.computer-go.org/mailman/listinfo/computer-go/

[computer-go] New UCT-RAVE formula (was Re: computer-go Digest, Vol 43, Issue 8)

Reply via email to