In the GP setting I'm using, K is the set of pair wise gaussian kernel 
evaluations between all the points in my data.  So if a point is x_i,z_i (x is 
the dependent variable and the z's are effectively explanatory variables), K_ij 
is more or less exp(-.5 * (z_i - z_j)^2 * theta).  

I don't think the problem is convex, but MATLAB's fminunc + tens of random 
starting values always converge to the same point.

Anyway, a previous commenter pointed out that LU factorization is implemented 
for dual numbers, so I'll give that a try.


> On Jun 3, 2014, at 7:15 AM, Tony Kelman <[email protected]> wrote:
> 
> Alright if K is dense then Ipopt may not be your best choice. How do the 
> theta parameters enter into K(theta)? Is the problem convex? Would the dual 
> problem be easier to work with? Have you read Boyd and Vandenberghe more 
> recently than I have? This sounds suspiciously like it can be expressed as a 
> semidefinite program, though I think the Julia interfaces to good SDP solvers 
> are still WIP.
> 
> Yes the majority of solvers for convex optimization rely on a Cholesky 
> decomposition at some point, the question is whether or not you pose the 
> problem in such a way that you have to differentiate the factors. I suspect 
> you shouldn't have to, but also haven't worked with this particular type of 
> problem in a few years.
> 
> 
>> On Tuesday, June 3, 2014 4:33:01 AM UTC-7, Thomas Covert wrote:
>> Thanks for the suggestion, Tony.  The problem I'm working with is that of 
>> fitting a gaussian process to data, so my objective function (the log 
>> likelihood of a GP given some data x) includes both an x' inv(K(theta)) x 
>> term and a log determinant of K(theta).  Here K() is a big dense  positive 
>> definite weight matrix and theta is the parameter(s) controlling the weights.
>> 
>> As far as I can tell, the fastest and most numerically stable way to 
>> implement these calculations is with a cholesky decomposition of K(theta).  
>> In my code where I've already done the work of finding derivatives by hand, 
>> I use a PDMat for K and the associated fast routines (like invquad and 
>> logdet) available for PDMats.  To use the DualNumbers autodiff code, I 
>> needed a way to cholesky factorize a symmetric matrix of dual numbers (which 
>> I assume was something like K(theta + epsilon) in the autodiff code).
>> 
>> However if you think there's a smarter way to do this kind of calculation 
>> that avoids a cholesky factorization I would definitely take a look.
>> 
>>> On Jun 3, 2014, at 5:41 AM, Tony Kelman <[email protected]> wrote:
>>> 
>>> Whoops, didn't finish writing a sentence there... was saying that most of 
>>> the algorithms in Optim don't implement equality constraints, but you've 
>>> got other choices of optimization solvers.
>>> 
>>> 
>>>> On Tuesday, June 3, 2014 3:39:44 AM UTC-7, Tony Kelman wrote:
>>>> Thomas,
>>>> 
>>>> If you're trying to invert (factorize) a matrix as part of your objective 
>>>> function, it's generally a much better idea to reformulate your 
>>>> optimization problem and use a different algorithm. Anywhere that you're 
>>>> essentially trying to use inv(A)*x, you should try introducing a new 
>>>> variable y and corresponding linear equality constraint x = A * y. Most of 
>>>> the algorithms in Optim  If your objective function in terms of x and y is 
>>>> linear, then you can use Clp, GLPK, etc. If it's still some complicated 
>>>> nonlinear thing even in terms of x and y, use JuMP with Ipopt. NLopt might 
>>>> work too, but it's not tied into the convenient JuMP reverse-mode autodiff 
>>>> framework yet.
>>>> 
>>>> -Tony
>>>> 
>>>> 
>>>>> On Monday, June 2, 2014 11:43:32 AM UTC-7, Thomas Covert wrote:
>>>>> I'm really excited to use DualNumbers and the autodiff option in 
>>>>> Optim.jl.  However, the functions I would like to optimize often make use 
>>>>> of linear algebra, including cholesky factorizations.  At present, 
>>>>> DualNumbers.jl does not implement a cholesky factorization.  My first 
>>>>> pass at a workaround for this was to just write out a standard cholesky 
>>>>> solver  (i.e., just writing out the equations listed in wikipedia).  This 
>>>>> seems to work fine - i.e., my "choldn()" function factors a DualNumber 
>>>>> array into upper and lower triangular matrices whose product is the 
>>>>> input.  
>>>>> 
>>>>> However, my "choldn" function listed below is MUCH slower than the LAPACK 
>>>>> call built into chol().  I can't say I'm surprised by this, but the speed 
>>>>> hit makes the whole enterprise of using autodiff with DualNumbers 
>>>>> considerably less useful.
>>>>> 
>>>>> I was hoping to find some neat linear algebra trick that would let me 
>>>>> compute a DualNumber cholesky factorization without having to resort to 
>>>>> non-LAPACK code, but I haven't found it yet.  That is, I figured that I 
>>>>> could compute the cholesky factorization of the real part of the matrix 
>>>>> and then separately compute a matrix which represented the DualNumber 
>>>>> part.  However, I've not yet found a math text describing such a beast.
>>>>> 
>>>>> Any ideas?  Am I wrong in thinking that a Cholesky decomposition of a 
>>>>> square array with DualNumbers is a well-defined mathematical object?
>>>>> 
>>>>> I've included my code for "choldn()" below.
>>>>> 
>>>>> Thanks in advance.
>>>>> 
>>>>> 
>>>>> CODE 
>>>>> 
>>>>> function choldn!(A::Array)
>>>>>     N = size(A,1);
>>>>>     T = eltype(A);
>>>>>     tmp = zero(T);
>>>>>     for j = 1:N # cols                                                    
>>>>>      
>>>>>         for i = j:N # rows                                                
>>>>>      
>>>>>             if i == j
>>>>>                 if i == 1
>>>>>                     A[i,j] = sqrt(A[i,j]);
>>>>>                 else
>>>>>                     tmp = zero(T);
>>>>>                     for k = 1:j-1
>>>>>                         tmp += A[j,k] ^ 2;
>>>>>                     end
>>>>>                     A[i,j] = sqrt(A[i,j] - tmp);
>>>>>                 end
>>>>>             else
>>>>>                 if j == 1
>>>>>                     A[i,j] = A[i,j] / A[j,j];
>>>>>                 else
>>>>>                     tmp = zero(T);
>>>>>                     for k = 1:j-1
>>>>>                         tmp += A[i,k] * A[j,k];
>>>>>                     end
>>>>>                     A[i,j] = (A[i,j] - tmp) / A[j,j];
>>>>>                 end
>>>>>             end
>>>>>         end
>>>>>     end
>>>>>     # then zero out the non-cholesky stuff                                
>>>>>      
>>>>>     for i = 1:N
>>>>>         if i < N
>>>>>             for j = i+1:N
>>>>>                 A[i,j] = zero(T);
>>>>>             end
>>>>>         end
>>>>>     end
>>>>> end
>>>>> 

Reply via email to