Alright if K is dense then Ipopt may not be your best choice. How do the 
theta parameters enter into K(theta)? Is the problem convex? Would the dual 
problem be easier to work with? Have you read Boyd and Vandenberghe more 
recently than I have? This sounds suspiciously like it can be expressed as 
a semidefinite program, though I think the Julia interfaces to good SDP 
solvers are still WIP.

Yes the majority of solvers for convex optimization rely on a Cholesky 
decomposition at some point, the question is whether or not you pose the 
problem in such a way that you have to differentiate the factors. I suspect 
you shouldn't have to, but also haven't worked with this particular type of 
problem in a few years.


On Tuesday, June 3, 2014 4:33:01 AM UTC-7, Thomas Covert wrote:
>
> Thanks for the suggestion, Tony.  The problem I'm working with is that of 
> fitting a gaussian process to data, so my objective function (the log 
> likelihood of a GP given some data x) includes both an x' inv(K(theta)) x 
> term and a log determinant of K(theta).  Here K() is a big dense  positive 
> definite weight matrix and theta is the parameter(s) controlling the 
> weights.
>
> As far as I can tell, the fastest and most numerically stable way to 
> implement these calculations is with a cholesky decomposition of K(theta). 
>  In my code where I've already done the work of finding derivatives by 
> hand, I use a PDMat for K and the associated fast routines (like invquad 
> and logdet) available for PDMats.  To use the DualNumbers autodiff code, I 
> needed a way to cholesky factorize a symmetric matrix of dual numbers 
> (which I assume was something like K(theta + epsilon) in the autodiff code).
>
> However if you think there's a smarter way to do this kind of calculation 
> that avoids a cholesky factorization I would definitely take a look.
>
> On Jun 3, 2014, at 5:41 AM, Tony Kelman <[email protected] <javascript:>> 
> wrote:
>
> Whoops, didn't finish writing a sentence there... was saying that most of 
> the algorithms in Optim don't implement equality constraints, but you've 
> got other choices of optimization solvers.
>
>
> On Tuesday, June 3, 2014 3:39:44 AM UTC-7, Tony Kelman wrote:
>>
>> Thomas,
>>
>> If you're trying to invert (factorize) a matrix as part of your objective 
>> function, it's generally a much better idea to reformulate your 
>> optimization problem and use a different algorithm. Anywhere that you're 
>> essentially trying to use inv(A)*x, you should try introducing a new 
>> variable y and corresponding linear equality constraint x = A * y. Most of 
>> the algorithms in Optim  If your objective function in terms of x and y is 
>> linear, then you can use Clp, GLPK, etc. If it's still some complicated 
>> nonlinear thing even in terms of x and y, use JuMP with Ipopt. NLopt might 
>> work too, but it's not tied into the convenient JuMP reverse-mode autodiff 
>> framework yet.
>>
>> -Tony
>>
>>
>> On Monday, June 2, 2014 11:43:32 AM UTC-7, Thomas Covert wrote:
>>>
>>> I'm really excited to use DualNumbers and the autodiff option in 
>>> Optim.jl.  However, the functions I would like to optimize often make use 
>>> of linear algebra, including cholesky factorizations.  At present, 
>>> DualNumbers.jl does not implement a cholesky factorization.  My first pass 
>>> at a workaround for this was to just write out a standard cholesky solver 
>>>  (i.e., just writing out the equations listed in wikipedia).  This seems to 
>>> work fine - i.e., my "choldn()" function factors a DualNumber array into 
>>> upper and lower triangular matrices whose product is the input.  
>>>
>>> However, my "choldn" function listed below is MUCH slower than the 
>>> LAPACK call built into chol().  I can't say I'm surprised by this, but the 
>>> speed hit makes the whole enterprise of using autodiff with DualNumbers 
>>> considerably less useful.
>>>
>>> I was hoping to find some neat linear algebra trick that would let me 
>>> compute a DualNumber cholesky factorization without having to resort to 
>>> non-LAPACK code, but I haven't found it yet.  That is, I figured that I 
>>> could compute the cholesky factorization of the real part of the matrix and 
>>> then separately compute a matrix which represented the DualNumber part. 
>>>  However, I've not yet found a math text describing such a beast.
>>>
>>> Any ideas?  Am I wrong in thinking that a Cholesky decomposition of a 
>>> square array with DualNumbers is a well-defined mathematical object?
>>>
>>> I've included my code for "choldn()" below.
>>>
>>> Thanks in advance.
>>>
>>>
>>> CODE 
>>>
>>> function choldn!(A::Array)
>>>
>>>     N = size(A,1);
>>>
>>>     T = eltype(A);
>>>
>>>     tmp = zero(T);
>>>
>>>     for j = 1:N # cols                                                 
>>>          
>>>
>>>         for i = j:N # rows                                             
>>>          
>>>
>>>             if i == j
>>>
>>>                 if i == 1
>>>
>>>                     A[i,j] = sqrt(A[i,j]);
>>>
>>>                 else
>>>
>>>                     tmp = zero(T);
>>>
>>>                     for k = 1:j-1
>>>
>>>                         tmp += A[j,k] ^ 2;
>>>
>>>                     end
>>>
>>>                     A[i,j] = sqrt(A[i,j] - tmp);
>>>
>>>                 end
>>>
>>>             else
>>>
>>>                 if j == 1
>>>
>>>                     A[i,j] = A[i,j] / A[j,j];
>>>
>>>                 else
>>>
>>>                     tmp = zero(T);
>>>
>>>                     for k = 1:j-1
>>>
>>>                         tmp += A[i,k] * A[j,k];
>>>
>>>                     end
>>>
>>>                     A[i,j] = (A[i,j] - tmp) / A[j,j];
>>>
>>>                 end
>>>
>>>             end
>>>
>>>         end
>>>
>>>     end
>>>
>>>     # then zero out the non-cholesky stuff                             
>>>          
>>>
>>>     for i = 1:N
>>>
>>>         if i < N
>>>
>>>             for j = i+1:N
>>>
>>>                 A[i,j] = zero(T);
>>>
>>>             end
>>>
>>>         end
>>>
>>>     end
>>>
>>> end
>>>
>>>

Reply via email to