Re: [Computer-go] New version of QLR

Rémi Coulom Sun, 20 Jun 2010 03:19:08 -0700

On 06/20/2010 02:47 AM, Petr Baudis wrote:

Hi!


On Sun, Jun 20, 2010 at 02:23:30AM +0200, Rémi Coulom wrote:

I have just made a new version of QLR. Download and screenshots are
available from there:
http://remi.coulom.free.fr/QLR/

This new version has more features, like support for draws, and
replications. It also has a much better optimization algorithm.


   I wanted to give it a try but I'd prefer knowing first how does it
work; do you plan to write up something about it in near future, or
should I rather dive into the source? :-) (Any pointers there?)

   Thanks!

                                Petr "Pasky" Baudis


Hi,

I'll write a paper.

But I can give a short description of the algorithm, because it is verysimple. This algorithm is made of two independent part: maximumestimation, and sampling policy.


/* Maximum estimation */
initialization: each sample is given a weight of 1
while not converged do:
 compute a quadratic regression over weighted samples
 estimate sample mean (mu) (Elo rating)
 estimate deviation (sigma) of mu (ie, a confidence interval)
 set weight of each sample to min(1, exp((r - mu) / (alpha * sigma)))
  - r is the estimated rating according to the quadratic regression
  - alpha is a meta-parameter of the algorithm

In fact, this iteration may not converge. In order to ensureconvergence, I bound weight changes by 0.1, and run a finite number ofiterations. Another simple way to ensure convergence is to never allow aweight to increase.

In the current version of QLR, alpha is a constant equal to 3. I chosethis constant by testing extensively with a variety of artificial problems.

A constant alpha may not be asymptotically optimal. I had a veryinteresting discussion with Eric Boesch about asymptotic performance. Myprevious conjecture was O(n^{-3/5}), but Eric convinced me that it isO(n^{-2/3}). Achieving optimal asymptotic performance might require tohave a value of alpha that changes over time. I'll have to study that.


/* Sampling policy */
sample within bounds with a probability density proportional to:
exp((r - mu) / (alpha * sigma))

In the past, I used a more clever sigma-optimal sampling policy. Fornon-quadratic functions, this more uniform sampling policy is moreefficient.


Rémi
_______________________________________________
Computer-go mailing list
[email protected]
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go

Re: [Computer-go] New version of QLR

Reply via email to