Thanks for the suggestion, Tony. The problem I'm working with is that of fitting a gaussian process to data, so my objective function (the log likelihood of a GP given some data x) includes both an x' inv(K(theta)) x term and a log determinant of K(theta). Here K() is a big dense positive definite weight matrix and theta is the parameter(s) controlling the weights.
As far as I can tell, the fastest and most numerically stable way to implement these calculations is with a cholesky decomposition of K(theta). In my code where I've already done the work of finding derivatives by hand, I use a PDMat for K and the associated fast routines (like invquad and logdet) available for PDMats. To use the DualNumbers autodiff code, I needed a way to cholesky factorize a symmetric matrix of dual numbers (which I assume was something like K(theta + epsilon) in the autodiff code). However if you think there's a smarter way to do this kind of calculation that avoids a cholesky factorization I would definitely take a look. > On Jun 3, 2014, at 5:41 AM, Tony Kelman <[email protected]> wrote: > > Whoops, didn't finish writing a sentence there... was saying that most of the > algorithms in Optim don't implement equality constraints, but you've got > other choices of optimization solvers. > > >> On Tuesday, June 3, 2014 3:39:44 AM UTC-7, Tony Kelman wrote: >> Thomas, >> >> If you're trying to invert (factorize) a matrix as part of your objective >> function, it's generally a much better idea to reformulate your optimization >> problem and use a different algorithm. Anywhere that you're essentially >> trying to use inv(A)*x, you should try introducing a new variable y and >> corresponding linear equality constraint x = A * y. Most of the algorithms >> in Optim If your objective function in terms of x and y is linear, then you >> can use Clp, GLPK, etc. If it's still some complicated nonlinear thing even >> in terms of x and y, use JuMP with Ipopt. NLopt might work too, but it's not >> tied into the convenient JuMP reverse-mode autodiff framework yet. >> >> -Tony >> >> >>> On Monday, June 2, 2014 11:43:32 AM UTC-7, Thomas Covert wrote: >>> I'm really excited to use DualNumbers and the autodiff option in Optim.jl. >>> However, the functions I would like to optimize often make use of linear >>> algebra, including cholesky factorizations. At present, DualNumbers.jl >>> does not implement a cholesky factorization. My first pass at a workaround >>> for this was to just write out a standard cholesky solver (i.e., just >>> writing out the equations listed in wikipedia). This seems to work fine - >>> i.e., my "choldn()" function factors a DualNumber array into upper and >>> lower triangular matrices whose product is the input. >>> >>> However, my "choldn" function listed below is MUCH slower than the LAPACK >>> call built into chol(). I can't say I'm surprised by this, but the speed >>> hit makes the whole enterprise of using autodiff with DualNumbers >>> considerably less useful. >>> >>> I was hoping to find some neat linear algebra trick that would let me >>> compute a DualNumber cholesky factorization without having to resort to >>> non-LAPACK code, but I haven't found it yet. That is, I figured that I >>> could compute the cholesky factorization of the real part of the matrix and >>> then separately compute a matrix which represented the DualNumber part. >>> However, I've not yet found a math text describing such a beast. >>> >>> Any ideas? Am I wrong in thinking that a Cholesky decomposition of a >>> square array with DualNumbers is a well-defined mathematical object? >>> >>> I've included my code for "choldn()" below. >>> >>> Thanks in advance. >>> >>> >>> CODE >>> >>> function choldn!(A::Array) >>> N = size(A,1); >>> T = eltype(A); >>> tmp = zero(T); >>> for j = 1:N # cols >>> >>> for i = j:N # rows >>> >>> if i == j >>> if i == 1 >>> A[i,j] = sqrt(A[i,j]); >>> else >>> tmp = zero(T); >>> for k = 1:j-1 >>> tmp += A[j,k] ^ 2; >>> end >>> A[i,j] = sqrt(A[i,j] - tmp); >>> end >>> else >>> if j == 1 >>> A[i,j] = A[i,j] / A[j,j]; >>> else >>> tmp = zero(T); >>> for k = 1:j-1 >>> tmp += A[i,k] * A[j,k]; >>> end >>> A[i,j] = (A[i,j] - tmp) / A[j,j]; >>> end >>> end >>> end >>> end >>> # then zero out the non-cholesky stuff >>> >>> for i = 1:N >>> if i < N >>> for j = i+1:N >>> A[i,j] = zero(T); >>> end >>> end >>> end >>> end >>>
