I was doing an in-depth review of Hunter's paper over the extended
weekend...  I think I now see the value of the MM algorithms, and the
understand the divergence problem.

MM has the nice property that convergence is guaranteed.  Newton-Raphson
(N-R) carries no such guarantee.  The paper includes a simple (83 variable?)
example where N-R fails.  That doesn't mean that it has no value, but any
self-respecting implementation may need to replace N-R steps with an MM
steps if N-R fails to increase the objective function (Hunter calls this
modified Newton-Raphson)

There are a few critical properties of a minorization function.  Below I use
Q to represent the minorized version, and L to represent to original
objective function.
 1. It must be guaranteed that Q( old x ) == L( old x )
 2. It must be guaranteed that L( new x ) >= Q( new x )
 3. It must be guaranteed that Q( new x ) > Q( old x )

Combining those together
L( new x ) >= Q(new x) > Q( old x) = L (old x)
... or L(new x) > L(old x)... ensuring forward progression.

Properties 1&2 are a basic requirement for minorized functions.  #3 is a
criteria on how to select between candidate minorized functions.  The math
of the new function must remain easy to achieve this.  Usually this will
mean defining a function that is easy to maximize.  In practice this is
further translated to mean that the partial derivatives of Q are independent
of all other variables being adjusted.  Without this, optimizing variables
independently (and doing a batch update) breaks property 3 and convergence
guarantees are lost.

This completely explains the divergence Remi observed and rationalizes why
his heuristic is correct (don't update more than one member of any team).
That restores the independence that makes finding a true maximum easy.

I've convinced myself that exactly replicating Remi's solution is the
correct first step in doing the ELO computations.  If speed is a factor, my
second step may be to compute multiple update strategies (such as N-R) and
then use whichever gives the largest jump in performance.

On Dec 21, 2007 2:07 PM, Rémi Coulom <[EMAIL PROTECTED]> wrote:

> Jason House wrote:
> > For each iteration of batch updates, do you pass over the data
> > computing whatever you can and then pick a set of mutually exclusive
> > items to update?  How bad would it be to update everything, including
> > those on the same team?
>
> The algorithm may not converge. In the beginning, when I had only a few
> features, I updated everything at the same time, and it worked. But when
> I added more features, it diverged.
>
>
> One advantage of MM is that it has proved convergence. It is
> difficult to get such a guarantee with approximate Newton.
>
_______________________________________________
computer-go mailing list
[email protected]
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to