Thanks for the suggestion, Tony.  The problem I'm working with is that of 
fitting a gaussian process to data, so my objective function (the log 
likelihood of a GP given some data x) includes both an x' inv(K(theta)) x term 
and a log determinant of K(theta).  Here K() is a big dense  positive definite 
weight matrix and theta is the parameter(s) controlling the weights.

As far as I can tell, the fastest and most numerically stable way to implement 
these calculations is with a cholesky decomposition of K(theta).  In my code 
where I've already done the work of finding derivatives by hand, I use a PDMat 
for K and the associated fast routines (like invquad and logdet) available for 
PDMats.  To use the DualNumbers autodiff code, I needed a way to cholesky 
factorize a symmetric matrix of dual numbers (which I assume was something like 
K(theta + epsilon) in the autodiff code).

However if you think there's a smarter way to do this kind of calculation that 
avoids a cholesky factorization I would definitely take a look.

> On Jun 3, 2014, at 5:41 AM, Tony Kelman <[email protected]> wrote:
> 
> Whoops, didn't finish writing a sentence there... was saying that most of the 
> algorithms in Optim don't implement equality constraints, but you've got 
> other choices of optimization solvers.
> 
> 
>> On Tuesday, June 3, 2014 3:39:44 AM UTC-7, Tony Kelman wrote:
>> Thomas,
>> 
>> If you're trying to invert (factorize) a matrix as part of your objective 
>> function, it's generally a much better idea to reformulate your optimization 
>> problem and use a different algorithm. Anywhere that you're essentially 
>> trying to use inv(A)*x, you should try introducing a new variable y and 
>> corresponding linear equality constraint x = A * y. Most of the algorithms 
>> in Optim  If your objective function in terms of x and y is linear, then you 
>> can use Clp, GLPK, etc. If it's still some complicated nonlinear thing even 
>> in terms of x and y, use JuMP with Ipopt. NLopt might work too, but it's not 
>> tied into the convenient JuMP reverse-mode autodiff framework yet.
>> 
>> -Tony
>> 
>> 
>>> On Monday, June 2, 2014 11:43:32 AM UTC-7, Thomas Covert wrote:
>>> I'm really excited to use DualNumbers and the autodiff option in Optim.jl.  
>>> However, the functions I would like to optimize often make use of linear 
>>> algebra, including cholesky factorizations.  At present, DualNumbers.jl 
>>> does not implement a cholesky factorization.  My first pass at a workaround 
>>> for this was to just write out a standard cholesky solver  (i.e., just 
>>> writing out the equations listed in wikipedia).  This seems to work fine - 
>>> i.e., my "choldn()" function factors a DualNumber array into upper and 
>>> lower triangular matrices whose product is the input.  
>>> 
>>> However, my "choldn" function listed below is MUCH slower than the LAPACK 
>>> call built into chol().  I can't say I'm surprised by this, but the speed 
>>> hit makes the whole enterprise of using autodiff with DualNumbers 
>>> considerably less useful.
>>> 
>>> I was hoping to find some neat linear algebra trick that would let me 
>>> compute a DualNumber cholesky factorization without having to resort to 
>>> non-LAPACK code, but I haven't found it yet.  That is, I figured that I 
>>> could compute the cholesky factorization of the real part of the matrix and 
>>> then separately compute a matrix which represented the DualNumber part.  
>>> However, I've not yet found a math text describing such a beast.
>>> 
>>> Any ideas?  Am I wrong in thinking that a Cholesky decomposition of a 
>>> square array with DualNumbers is a well-defined mathematical object?
>>> 
>>> I've included my code for "choldn()" below.
>>> 
>>> Thanks in advance.
>>> 
>>> 
>>> CODE 
>>> 
>>> function choldn!(A::Array)
>>>     N = size(A,1);
>>>     T = eltype(A);
>>>     tmp = zero(T);
>>>     for j = 1:N # cols                                                      
>>>    
>>>         for i = j:N # rows                                                  
>>>    
>>>             if i == j
>>>                 if i == 1
>>>                     A[i,j] = sqrt(A[i,j]);
>>>                 else
>>>                     tmp = zero(T);
>>>                     for k = 1:j-1
>>>                         tmp += A[j,k] ^ 2;
>>>                     end
>>>                     A[i,j] = sqrt(A[i,j] - tmp);
>>>                 end
>>>             else
>>>                 if j == 1
>>>                     A[i,j] = A[i,j] / A[j,j];
>>>                 else
>>>                     tmp = zero(T);
>>>                     for k = 1:j-1
>>>                         tmp += A[i,k] * A[j,k];
>>>                     end
>>>                     A[i,j] = (A[i,j] - tmp) / A[j,j];
>>>                 end
>>>             end
>>>         end
>>>     end
>>>     # then zero out the non-cholesky stuff                                  
>>>    
>>>     for i = 1:N
>>>         if i < N
>>>             for j = i+1:N
>>>                 A[i,j] = zero(T);
>>>             end
>>>         end
>>>     end
>>> end
>>> 

Reply via email to