On 06/20/2010 02:47 AM, Petr Baudis wrote:
   Hi!

On Sun, Jun 20, 2010 at 02:23:30AM +0200, Rémi Coulom wrote:
I have just made a new version of QLR. Download and screenshots are
available from there:
http://remi.coulom.free.fr/QLR/

This new version has more features, like support for draws, and
replications. It also has a much better optimization algorithm.

   I wanted to give it a try but I'd prefer knowing first how does it
work; do you plan to write up something about it in near future, or
should I rather dive into the source? :-) (Any pointers there?)

   Thanks!

                                Petr "Pasky" Baudis


Hi,

I'll write a paper.

But I can give a short description of the algorithm, because it is very simple. This algorithm is made of two independent part: maximum estimation, and sampling policy.

/* Maximum estimation */
initialization: each sample is given a weight of 1
while not converged do:
 compute a quadratic regression over weighted samples
 estimate sample mean (mu) (Elo rating)
 estimate deviation (sigma) of mu (ie, a confidence interval)
 set weight of each sample to min(1, exp((r - mu) / (alpha * sigma)))
  - r is the estimated rating according to the quadratic regression
  - alpha is a meta-parameter of the algorithm

In fact, this iteration may not converge. In order to ensure convergence, I bound weight changes by 0.1, and run a finite number of iterations. Another simple way to ensure convergence is to never allow a weight to increase.

In the current version of QLR, alpha is a constant equal to 3. I chose this constant by testing extensively with a variety of artificial problems.

A constant alpha may not be asymptotically optimal. I had a very interesting discussion with Eric Boesch about asymptotic performance. My previous conjecture was O(n^{-3/5}), but Eric convinced me that it is O(n^{-2/3}). Achieving optimal asymptotic performance might require to have a value of alpha that changes over time. I'll have to study that.

/* Sampling policy */
sample within bounds with a probability density proportional to:
exp((r - mu) / (alpha * sigma))

In the past, I used a more clever sigma-optimal sampling policy. For non-quadratic functions, this more uniform sampling policy is more efficient.

Rémi
_______________________________________________
Computer-go mailing list
[email protected]
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go

Reply via email to