In the GP setting I'm using, K is the set of pair wise gaussian kernel evaluations between all the points in my data. So if a point is x_i,z_i (x is the dependent variable and the z's are effectively explanatory variables), K_ij is more or less exp(-.5 * (z_i - z_j)^2 * theta).
I don't think the problem is convex, but MATLAB's fminunc + tens of random starting values always converge to the same point. Anyway, a previous commenter pointed out that LU factorization is implemented for dual numbers, so I'll give that a try. > On Jun 3, 2014, at 7:15 AM, Tony Kelman <[email protected]> wrote: > > Alright if K is dense then Ipopt may not be your best choice. How do the > theta parameters enter into K(theta)? Is the problem convex? Would the dual > problem be easier to work with? Have you read Boyd and Vandenberghe more > recently than I have? This sounds suspiciously like it can be expressed as a > semidefinite program, though I think the Julia interfaces to good SDP solvers > are still WIP. > > Yes the majority of solvers for convex optimization rely on a Cholesky > decomposition at some point, the question is whether or not you pose the > problem in such a way that you have to differentiate the factors. I suspect > you shouldn't have to, but also haven't worked with this particular type of > problem in a few years. > > >> On Tuesday, June 3, 2014 4:33:01 AM UTC-7, Thomas Covert wrote: >> Thanks for the suggestion, Tony. The problem I'm working with is that of >> fitting a gaussian process to data, so my objective function (the log >> likelihood of a GP given some data x) includes both an x' inv(K(theta)) x >> term and a log determinant of K(theta). Here K() is a big dense positive >> definite weight matrix and theta is the parameter(s) controlling the weights. >> >> As far as I can tell, the fastest and most numerically stable way to >> implement these calculations is with a cholesky decomposition of K(theta). >> In my code where I've already done the work of finding derivatives by hand, >> I use a PDMat for K and the associated fast routines (like invquad and >> logdet) available for PDMats. To use the DualNumbers autodiff code, I >> needed a way to cholesky factorize a symmetric matrix of dual numbers (which >> I assume was something like K(theta + epsilon) in the autodiff code). >> >> However if you think there's a smarter way to do this kind of calculation >> that avoids a cholesky factorization I would definitely take a look. >> >>> On Jun 3, 2014, at 5:41 AM, Tony Kelman <[email protected]> wrote: >>> >>> Whoops, didn't finish writing a sentence there... was saying that most of >>> the algorithms in Optim don't implement equality constraints, but you've >>> got other choices of optimization solvers. >>> >>> >>>> On Tuesday, June 3, 2014 3:39:44 AM UTC-7, Tony Kelman wrote: >>>> Thomas, >>>> >>>> If you're trying to invert (factorize) a matrix as part of your objective >>>> function, it's generally a much better idea to reformulate your >>>> optimization problem and use a different algorithm. Anywhere that you're >>>> essentially trying to use inv(A)*x, you should try introducing a new >>>> variable y and corresponding linear equality constraint x = A * y. Most of >>>> the algorithms in Optim If your objective function in terms of x and y is >>>> linear, then you can use Clp, GLPK, etc. If it's still some complicated >>>> nonlinear thing even in terms of x and y, use JuMP with Ipopt. NLopt might >>>> work too, but it's not tied into the convenient JuMP reverse-mode autodiff >>>> framework yet. >>>> >>>> -Tony >>>> >>>> >>>>> On Monday, June 2, 2014 11:43:32 AM UTC-7, Thomas Covert wrote: >>>>> I'm really excited to use DualNumbers and the autodiff option in >>>>> Optim.jl. However, the functions I would like to optimize often make use >>>>> of linear algebra, including cholesky factorizations. At present, >>>>> DualNumbers.jl does not implement a cholesky factorization. My first >>>>> pass at a workaround for this was to just write out a standard cholesky >>>>> solver (i.e., just writing out the equations listed in wikipedia). This >>>>> seems to work fine - i.e., my "choldn()" function factors a DualNumber >>>>> array into upper and lower triangular matrices whose product is the >>>>> input. >>>>> >>>>> However, my "choldn" function listed below is MUCH slower than the LAPACK >>>>> call built into chol(). I can't say I'm surprised by this, but the speed >>>>> hit makes the whole enterprise of using autodiff with DualNumbers >>>>> considerably less useful. >>>>> >>>>> I was hoping to find some neat linear algebra trick that would let me >>>>> compute a DualNumber cholesky factorization without having to resort to >>>>> non-LAPACK code, but I haven't found it yet. That is, I figured that I >>>>> could compute the cholesky factorization of the real part of the matrix >>>>> and then separately compute a matrix which represented the DualNumber >>>>> part. However, I've not yet found a math text describing such a beast. >>>>> >>>>> Any ideas? Am I wrong in thinking that a Cholesky decomposition of a >>>>> square array with DualNumbers is a well-defined mathematical object? >>>>> >>>>> I've included my code for "choldn()" below. >>>>> >>>>> Thanks in advance. >>>>> >>>>> >>>>> CODE >>>>> >>>>> function choldn!(A::Array) >>>>> N = size(A,1); >>>>> T = eltype(A); >>>>> tmp = zero(T); >>>>> for j = 1:N # cols >>>>> >>>>> for i = j:N # rows >>>>> >>>>> if i == j >>>>> if i == 1 >>>>> A[i,j] = sqrt(A[i,j]); >>>>> else >>>>> tmp = zero(T); >>>>> for k = 1:j-1 >>>>> tmp += A[j,k] ^ 2; >>>>> end >>>>> A[i,j] = sqrt(A[i,j] - tmp); >>>>> end >>>>> else >>>>> if j == 1 >>>>> A[i,j] = A[i,j] / A[j,j]; >>>>> else >>>>> tmp = zero(T); >>>>> for k = 1:j-1 >>>>> tmp += A[i,k] * A[j,k]; >>>>> end >>>>> A[i,j] = (A[i,j] - tmp) / A[j,j]; >>>>> end >>>>> end >>>>> end >>>>> end >>>>> # then zero out the non-cholesky stuff >>>>> >>>>> for i = 1:N >>>>> if i < N >>>>> for j = i+1:N >>>>> A[i,j] = zero(T); >>>>> end >>>>> end >>>>> end >>>>> end >>>>>
