Re: [julia-users] Optim.jl: unexpected results when using gradient-based methods

John Myles White Tue, 20 May 2014 08:48:51 -0700

Glad that you were able to figure out the source of your problems.

It would be good to get a sense of the amount of time spent inside your 
objective function vs. the amount of time spent in the code for optimize(). In 
general, my experience is that >>90% of the compute time for an optimization 
problem is spent in the objective function itself. If you instrument your 
objective function to produce timing information on each call, that would help 
a lot since you could then get a sense of how much time is being spent in the 
code for optimize() after accounting for your function itself.


It’s also worth keeping in mind that your use of implicit finite differencing 
means that your objective function is being called a lot more times than 
theoretically necessary, so that any minor performance issue in it will very 
substantially slow down the solver.

Regarding you objective function’s code, I suspect that the combination of 
global variables and memory-allocating vectorized arithmetic means that your 
objective function might be a good bit slower in Julia than in Matlab. Matlab 
seems to be a little better about garbage collection for vectorized arithmetic 
and Julia is generally not able to optimize code involving global variables.

Hope that points you in the right direction.

 — John

On May 20, 2014, at 8:34 AM, Holger Stichnoth <[email protected]> wrote:

> Hi Andreas,
> hi John,
> hi Miles (via julia-opt, where I mistakenly also posted my question 
> yesterday),
> 
> Thanks for your help. Here is the link to the Gist I created: 
> https://gist.github.com/anonymous/5f95ab1afd241c0a5962
> 
> In the process of producing a minimal (non-)working example, I discovered 
> that the unexpected results are due to the truncation of the logit choice 
> probabilities in the objective function. Optim.optimize() is sensitive to 
> this when method = :l_bfgs is used. With method = :nelder_mead, everything 
> works fine. When I comment out the truncation, :l_bfgs works as well. 
> However, I need to increase the xtol from its default of 1e-12 to at least 
> 1e-10, otherwise I get the error that the linesearch failed to converge.
> 
> I guess I should just do without the truncation. The logit probabilities are 
> between 0 and 1 by construction anyway. I had just copied the truncation code 
> from a friend who had told me that probabilities that are too close to 0 or 1 
> sometimes cause numerical problems in his Matlab code of the same function. 
> With Optim.optimize(), it seems to be the other way around, i.e. moving the 
> probabilities further away from 0 or 1 (even by tiny amounts) means that the 
> stability of the (gradient-based) algorithm is reduced.
> 
> So for me, the problem is solved. The problem was not with Optim.jl, but with 
> my own code.
> 
> The only other thing that I discovered when trying out Julia and Optim.jl is 
> that the optimization is currently considerably slower than Matlab's fminunc. 
> From the Gist I provided above, are there any potential performance 
> improvements that I am missing out on?
> 
> Best wishes,
> Holger
> 
> 
> On Monday, 19 May 2014 14:51:16 UTC+1, John Myles White wrote:
> If you can, please do share an example of your code. Logit-style models are 
> in general numerically unstable, so it would be good to see how exactly 
> you’ve coded things up.
> 
> One thing you may be able to do is use automatic differentiation via the 
> autodiff = true keyword to optimize, but that assumes that your objective 
> function is written in completely pure Julia code (which means, for example, 
> that your code must not call any of functions not written in Julia provided 
> by Distributions.jl).
> 
>  — John
> 
> On May 19, 2014, at 4:09 AM, Andreas Noack Jensen <[email protected]> 
> wrote:
> 
>> What is the output of versioninfo() and Pkg.installed("Optim")? Also, would 
>> it be possible to make a gist with your code?
>> 
>> 
>> 2014-05-19 12:44 GMT+02:00 Holger Stichnoth <[email protected]>:
>>  Hello,
>> 
>> I installed Julia a couple of days ago and was impressed how easy it was to 
>> make the switch from Matlab and to parallelize my code
>> (something I had never done before in any language; I'm an economist with 
>> only limited programming experience, mainly in Stata and Matlab).
>> 
>> However, I ran into a problem when using Optim.jl for Maximum Likelihood 
>> estimation of a conditional logit model. With the default Nelder-Mead 
>> algorithm, optimize from the Optim.jl package gave me the same result that I 
>> had obtained in Stata and Matlab.
>> 
>> With gradient-based methods such as BFGS, however, the algorithm jumped from 
>> the starting values to parameter values that are completely different. This 
>> happened for all thr starting values I tried, including the case in which I 
>> took a vector that is closed to the optimum from the Nelder-Mead algorithm.  
>> 
>> The problem seems to be that the algorithm tried values so large (in 
>> absolute value) that this caused problems for the objective
>> function, where I call exponential functions into which these parameter 
>> values enter. As a result, the optimization based on the BFGS algorithm did 
>> not produce the expected optimum.
>> 
>> While I could try to provide the analytical gradient in this simple case, I 
>> was planning to use Julia for Maximum Likelihood or Simulated Maximum 
>> Likelihood estimation in cases where the gradient is more difficult to 
>> derive, so it would be good if I could make the optimizer run also with 
>> numerical gradients.
>> 
>> I suspect that my problems with optimize from Optim.jl could have something 
>> to do with the gradient() function. In the example below, for instance, I do 
>> not understand why the output of the gradient function includes values such 
>> as 11470.7, given that the function values differ only minimally.
>> 
>> Best wishes,
>> Holger
>> 
>> 
>> julia> Optim.gradient(clogit_ll,zeros(4))
>> 60554544523933395e-22
>> 0Op
>> 0
>> 0
>> 
>> 14923.564009972584
>> -60554544523933395e-22
>> 0
>> 0
>> 0
>> 
>> 14923.565228435104
>> 0
>> 60554544523933395e-22
>> 0
>> 0
>> 
>> 14923.569064311248
>> 0
>> -60554544523933395e-22
>> 0
>> 0
>> 
>> 14923.560174904109
>> 0
>> 0
>> 60554544523933395e-22
>> 0
>> 
>> 14923.63413848258
>> 0
>> 0
>> -60554544523933395e-22
>> 0
>> 
>> 14923.495218282553
>> 0
>> 0
>> 0
>> 60554544523933395e-22
>> 
>> 14923.58699717058
>> 0
>> 0
>> 0
>> -60554544523933395e-22
>> 
>> 14923.54224130672
>> 4-element Array{Float64,1}:
>>   -100.609
>>    734.0
>>  11470.7
>>   3695.5
>> 
>> function clogit_ll(beta::Vector)
>> 
>>     # Print the parameters and the return value to
>>     # check how gradient() and optimize() work.
>>     println(beta) 
>>     println(-sum(compute_ll(beta,T,0)))
>> 
>>     # compute_ll computes the individual likelihood contributions
>>     # in the sample. T is the number of periods in the panel. The 0
>>     # is not used in this simple example. In related functions, I
>>     # pass on different values here to estimate finite mixtures of
>>     # the conditional logit model.
>>     return -sum(compute_ll(beta,T,0))
>> end
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> -- 
>> Med venlig hilsen
>> 
>> Andreas Noack Jensen
>

Re: [julia-users] Optim.jl: unexpected results when using gradient-based methods

Reply via email to