Alright if K is dense then Ipopt may not be your best choice. How do the theta parameters enter into K(theta)? Is the problem convex? Would the dual problem be easier to work with? Have you read Boyd and Vandenberghe more recently than I have? This sounds suspiciously like it can be expressed as a semidefinite program, though I think the Julia interfaces to good SDP solvers are still WIP.
Yes the majority of solvers for convex optimization rely on a Cholesky decomposition at some point, the question is whether or not you pose the problem in such a way that you have to differentiate the factors. I suspect you shouldn't have to, but also haven't worked with this particular type of problem in a few years. On Tuesday, June 3, 2014 4:33:01 AM UTC-7, Thomas Covert wrote: > > Thanks for the suggestion, Tony. The problem I'm working with is that of > fitting a gaussian process to data, so my objective function (the log > likelihood of a GP given some data x) includes both an x' inv(K(theta)) x > term and a log determinant of K(theta). Here K() is a big dense positive > definite weight matrix and theta is the parameter(s) controlling the > weights. > > As far as I can tell, the fastest and most numerically stable way to > implement these calculations is with a cholesky decomposition of K(theta). > In my code where I've already done the work of finding derivatives by > hand, I use a PDMat for K and the associated fast routines (like invquad > and logdet) available for PDMats. To use the DualNumbers autodiff code, I > needed a way to cholesky factorize a symmetric matrix of dual numbers > (which I assume was something like K(theta + epsilon) in the autodiff code). > > However if you think there's a smarter way to do this kind of calculation > that avoids a cholesky factorization I would definitely take a look. > > On Jun 3, 2014, at 5:41 AM, Tony Kelman <[email protected] <javascript:>> > wrote: > > Whoops, didn't finish writing a sentence there... was saying that most of > the algorithms in Optim don't implement equality constraints, but you've > got other choices of optimization solvers. > > > On Tuesday, June 3, 2014 3:39:44 AM UTC-7, Tony Kelman wrote: >> >> Thomas, >> >> If you're trying to invert (factorize) a matrix as part of your objective >> function, it's generally a much better idea to reformulate your >> optimization problem and use a different algorithm. Anywhere that you're >> essentially trying to use inv(A)*x, you should try introducing a new >> variable y and corresponding linear equality constraint x = A * y. Most of >> the algorithms in Optim If your objective function in terms of x and y is >> linear, then you can use Clp, GLPK, etc. If it's still some complicated >> nonlinear thing even in terms of x and y, use JuMP with Ipopt. NLopt might >> work too, but it's not tied into the convenient JuMP reverse-mode autodiff >> framework yet. >> >> -Tony >> >> >> On Monday, June 2, 2014 11:43:32 AM UTC-7, Thomas Covert wrote: >>> >>> I'm really excited to use DualNumbers and the autodiff option in >>> Optim.jl. However, the functions I would like to optimize often make use >>> of linear algebra, including cholesky factorizations. At present, >>> DualNumbers.jl does not implement a cholesky factorization. My first pass >>> at a workaround for this was to just write out a standard cholesky solver >>> (i.e., just writing out the equations listed in wikipedia). This seems to >>> work fine - i.e., my "choldn()" function factors a DualNumber array into >>> upper and lower triangular matrices whose product is the input. >>> >>> However, my "choldn" function listed below is MUCH slower than the >>> LAPACK call built into chol(). I can't say I'm surprised by this, but the >>> speed hit makes the whole enterprise of using autodiff with DualNumbers >>> considerably less useful. >>> >>> I was hoping to find some neat linear algebra trick that would let me >>> compute a DualNumber cholesky factorization without having to resort to >>> non-LAPACK code, but I haven't found it yet. That is, I figured that I >>> could compute the cholesky factorization of the real part of the matrix and >>> then separately compute a matrix which represented the DualNumber part. >>> However, I've not yet found a math text describing such a beast. >>> >>> Any ideas? Am I wrong in thinking that a Cholesky decomposition of a >>> square array with DualNumbers is a well-defined mathematical object? >>> >>> I've included my code for "choldn()" below. >>> >>> Thanks in advance. >>> >>> >>> CODE >>> >>> function choldn!(A::Array) >>> >>> N = size(A,1); >>> >>> T = eltype(A); >>> >>> tmp = zero(T); >>> >>> for j = 1:N # cols >>> >>> >>> for i = j:N # rows >>> >>> >>> if i == j >>> >>> if i == 1 >>> >>> A[i,j] = sqrt(A[i,j]); >>> >>> else >>> >>> tmp = zero(T); >>> >>> for k = 1:j-1 >>> >>> tmp += A[j,k] ^ 2; >>> >>> end >>> >>> A[i,j] = sqrt(A[i,j] - tmp); >>> >>> end >>> >>> else >>> >>> if j == 1 >>> >>> A[i,j] = A[i,j] / A[j,j]; >>> >>> else >>> >>> tmp = zero(T); >>> >>> for k = 1:j-1 >>> >>> tmp += A[i,k] * A[j,k]; >>> >>> end >>> >>> A[i,j] = (A[i,j] - tmp) / A[j,j]; >>> >>> end >>> >>> end >>> >>> end >>> >>> end >>> >>> # then zero out the non-cholesky stuff >>> >>> >>> for i = 1:N >>> >>> if i < N >>> >>> for j = i+1:N >>> >>> A[i,j] = zero(T); >>> >>> end >>> >>> end >>> >>> end >>> >>> end >>> >>>
