On 06/20/2010 02:47 AM, Petr Baudis wrote:
Hi!
On Sun, Jun 20, 2010 at 02:23:30AM +0200, Rémi Coulom wrote:
I have just made a new version of QLR. Download and screenshots are
available from there:
http://remi.coulom.free.fr/QLR/
This new version has more features, like support for draws, and
replications. It also has a much better optimization algorithm.
I wanted to give it a try but I'd prefer knowing first how does it
work; do you plan to write up something about it in near future, or
should I rather dive into the source? :-) (Any pointers there?)
Thanks!
Petr "Pasky" Baudis
Hi,
I'll write a paper.
But I can give a short description of the algorithm, because it is very
simple. This algorithm is made of two independent part: maximum
estimation, and sampling policy.
/* Maximum estimation */
initialization: each sample is given a weight of 1
while not converged do:
compute a quadratic regression over weighted samples
estimate sample mean (mu) (Elo rating)
estimate deviation (sigma) of mu (ie, a confidence interval)
set weight of each sample to min(1, exp((r - mu) / (alpha * sigma)))
- r is the estimated rating according to the quadratic regression
- alpha is a meta-parameter of the algorithm
In fact, this iteration may not converge. In order to ensure
convergence, I bound weight changes by 0.1, and run a finite number of
iterations. Another simple way to ensure convergence is to never allow a
weight to increase.
In the current version of QLR, alpha is a constant equal to 3. I chose
this constant by testing extensively with a variety of artificial problems.
A constant alpha may not be asymptotically optimal. I had a very
interesting discussion with Eric Boesch about asymptotic performance. My
previous conjecture was O(n^{-3/5}), but Eric convinced me that it is
O(n^{-2/3}). Achieving optimal asymptotic performance might require to
have a value of alpha that changes over time. I'll have to study that.
/* Sampling policy */
sample within bounds with a probability density proportional to:
exp((r - mu) / (alpha * sigma))
In the past, I used a more clever sigma-optimal sampling policy. For
non-quadratic functions, this more uniform sampling policy is more
efficient.
Rémi
_______________________________________________
Computer-go mailing list
[email protected]
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go