I was doing an in-depth review of Hunter's paper over the extended weekend... I think I now see the value of the MM algorithms, and the understand the divergence problem.
MM has the nice property that convergence is guaranteed. Newton-Raphson (N-R) carries no such guarantee. The paper includes a simple (83 variable?) example where N-R fails. That doesn't mean that it has no value, but any self-respecting implementation may need to replace N-R steps with an MM steps if N-R fails to increase the objective function (Hunter calls this modified Newton-Raphson) There are a few critical properties of a minorization function. Below I use Q to represent the minorized version, and L to represent to original objective function. 1. It must be guaranteed that Q( old x ) == L( old x ) 2. It must be guaranteed that L( new x ) >= Q( new x ) 3. It must be guaranteed that Q( new x ) > Q( old x ) Combining those together L( new x ) >= Q(new x) > Q( old x) = L (old x) ... or L(new x) > L(old x)... ensuring forward progression. Properties 1&2 are a basic requirement for minorized functions. #3 is a criteria on how to select between candidate minorized functions. The math of the new function must remain easy to achieve this. Usually this will mean defining a function that is easy to maximize. In practice this is further translated to mean that the partial derivatives of Q are independent of all other variables being adjusted. Without this, optimizing variables independently (and doing a batch update) breaks property 3 and convergence guarantees are lost. This completely explains the divergence Remi observed and rationalizes why his heuristic is correct (don't update more than one member of any team). That restores the independence that makes finding a true maximum easy. I've convinced myself that exactly replicating Remi's solution is the correct first step in doing the ELO computations. If speed is a factor, my second step may be to compute multiple update strategies (such as N-R) and then use whichever gives the largest jump in performance. On Dec 21, 2007 2:07 PM, Rémi Coulom <[EMAIL PROTECTED]> wrote: > Jason House wrote: > > For each iteration of batch updates, do you pass over the data > > computing whatever you can and then pick a set of mutually exclusive > > items to update? How bad would it be to update everything, including > > those on the same team? > > The algorithm may not converge. In the beginning, when I had only a few > features, I updated everything at the same time, and it worked. But when > I added more features, it diverged. > > > One advantage of MM is that it has proved convergence. It is > difficult to get such a guarantee with approximate Newton. >
_______________________________________________ computer-go mailing list [email protected] http://www.computer-go.org/mailman/listinfo/computer-go/
