Re: [julia-users] Optim.jl: unexpected results when using gradient-based methods

John Myles White Thu, 22 May 2014 07:35:36 -0700

How many function evaluations is Matlab performing and how many is Julia 
performing?


 — John

On May 22, 2014, at 6:53 AM, Holger Stichnoth <[email protected]> wrote:

> Thanks, it's faster now (by roughly a factor of 3 on my computer), but still 
> considerably slower than fminunc:
> 
> Averages over 20 runs:
> Julia/Optim.optimize: 10.5s
> Matlab/fminunc: 2.6s
> 
> Here are my Matlab settings:
> options = optimset('Display', 'iter', ...
>      'MaxIter', 2500, 'MaxFunEvals', 500000, ...
>      'TolFun', 1e-6, 'TolX', 1e-6, ...
>      'GradObj', 'off', 'DerivativeCheck', 'off');
> 
> startb      = ones(1,nVar)';
> [estim_clo, ll_clo]= ...
>      fminunc(@(param)clogit_ll(param,data), ...
>      startb,options);
> 
> Could the speed issue be related to the following messages that I get when I 
> run the Julia code?
> C:\Users\User\Documents\References\Software\Julia\mlubin>julia main.jl
> Warning: could not import Base.foldl into NumericExtensions
> Warning: could not import Base.foldr into NumericExtensions
> Warning: could not import Base.sum! into NumericExtensions
> Warning: could not import Base.maximum! into NumericExtensions
> Warning: could not import Base.minimum! into NumericExtensions
> 
> 
> 
> 
> Am Donnerstag, 22. Mai 2014 14:18:36 UTC+1 schrieb Miles Lubin:
> I was able to get a nearly 5x speedup by avoiding the matrix allocation and 
> making the accumulators type stable: 
> https://gist.github.com/mlubin/055690ddf2466e98bba6
> 
> How does this compare with Matlab now?
> 
> On Thursday, May 22, 2014 6:38:44 AM UTC-4, Holger Stichnoth wrote:
> @ John: You are right, when I specify the function as clogit_ll(beta::Vector) 
> instead of clogit_ll(beta::Vector{Float64}), autodiff = true works fine. 
> Thanks for your help!
> 
> @ Tim: I have set the rather strict default convergence criteria of 
> Optim.optimize to Matlab's default values for fminunc, but the speed 
> difference is still there.
> 
> @ Miles/John: Getting rid of the global variables through closures and 
> devectorizing made the optimization _slower_ not faster in my case: 
> https://gist.github.com/stichnoth/7f251ded83dcaa384273. I was surprised to 
> see this as I expected a speed increase myself. 
>  
> 
> Am Mittwoch, 21. Mai 2014 16:48:51 UTC+1 schrieb Miles Lubin:
> Just to extend on what John said, also think that if you can restructure the 
> code to devectorize it and avoid using global variables, you'll see *much* 
> better performance. 
> 
> The way to avoid globals is by using closures, for example:
> function foo(x, data)
>     ...
> end
> 
> 
> ...
> data_raw = readcsv(file)
> data = reshape(data_raw, nObs, nChoices*(1+nVar), T)
> 
> 
> 
> Optim.optimize(x-> foo(x,data), ...)
> 
> 
> 
> On Tuesday, May 20, 2014 11:47:39 AM UTC-4, John Myles White wrote:
> Glad that you were able to figure out the source of your problems.
> 
> It would be good to get a sense of the amount of time spent inside your 
> objective function vs. the amount of time spent in the code for optimize(). 
> In general, my experience is that >>90% of the compute time for an 
> optimization problem is spent in the objective function itself. If you 
> instrument your objective function to produce timing information on each 
> call, that would help a lot since you could then get a sense of how much time 
> is being spent in the code for optimize() after accounting for your function 
> itself.
> 
> It’s also worth keeping in mind that your use of implicit finite differencing 
> means that your objective function is being called a lot more times than 
> theoretically necessary, so that any minor performance issue in it will very 
> substantially slow down the solver.
> 
> Regarding you objective function’s code, I suspect that the combination of 
> global variables and memory-allocating vectorized arithmetic means that your 
> objective function might be a good bit slower in Julia than in Matlab. Matlab 
> seems to be a little better about garbage collection for vectorized 
> arithmetic and Julia is generally not able to optimize code involving global 
> variables.
> 
> Hope that points you in the right direction.
> 
>  — John
> 
> On May 20, 2014, at 8:34 AM, Holger Stichnoth <[email protected]> wrote:
> 
>> Hi Andreas,
>> hi John,
>> hi Miles (via julia-opt, where I mistakenly also posted my question 
>> yesterday),
>> 
>> Thanks for your help. Here is the link to the Gist I created: 
>> https://gist.github.com/anonymous/5f95ab1afd241c0a5962
>> 
>> In the process of producing a minimal (non-)working example, I discovered 
>> that the unexpected results are due to the truncation of the logit choice 
>> probabilities in the objective function. Optim.optimize() is sensitive to 
>> this when method = :l_bfgs is used. With method = :nelder_mead, everything 
>> works fine. When I comment out the truncation, :l_bfgs works as well. 
>> However, I need to increase the xtol from its default of 1e-12 to at least 
>> 1e-10, otherwise I get the error that the linesearch failed to converge.
>> 
>> I guess I should just do without the truncation. The logit probabilities are 
>> between 0 and 1 by construction anyway. I had just copied the truncation 
>> code from a friend who had told me that probabilities that are too close to 
>> 0 or 1 sometimes cause numerical problems in his Matlab code of the same 
>> function. With Optim.optimize(), it seems to be the other way around, i.e. 
>> moving the probabilities further away from 0 or 1 (even by tiny amounts) 
>> means that the stability of the (gradient-based) algorithm is reduced.
>> 
>> So for me, the problem is solved. The problem was not with Optim.jl, but 
>> with my own code.
>> 
>> The only other thing that I discovered when trying out Julia and Optim.jl is 
>> that the optimization is currently considerably slower than Matlab's 
>> fminunc. From the Gist I provided above, are there any potential performance 
>> improvements that I am missing out on?
>> 
>> Best wishes,
>> Holger
>> 
>> 
>> On Monday, 19 May 2014 14:51:16 UTC+1, John Myles White wrote:
>> If you can, please do share an example of your code. Logit-style models are 
>> in general numerically unstable, so it would be good to see how exactly 
>> you’ve coded things up.
>> 
>> One thing you may be able to do is use automatic differentiation via the 
>> autodiff = true keyword to optimize, but that assumes that your objective 
>> function is written in completely pure Julia code (which means, for example, 
>> that your code must not call any of functions not written in Julia provided 
>> by Distributions.jl).
>> 
>>  — John
>> 
>> On May 19, 2014, at 4:09 AM, Andreas Noack Jensen <[email protected]> 
>> wrote:
>> 
>>> What is the output of versioninfo() and Pkg.installed("Optim")? Also, would 
>>> it be possible to make a gist with your code?
>>> 
>>> 
>>> 2014-05-19 12:44 GMT+02:00 Holger Stichnoth <[email protected]>:
>>>  Hello,
>>> 
>>> I installed Julia a couple of days ago and was impressed how easy it was to 
>>> make the switch from Matlab and to parallelize my code
>>> (something I had never done before in any language; I'm an economist with 
>>> only limited programming experience, mainly in Stata and Matlab).
>>> 
>>> However, I ran into a problem when using Optim.jl for Maximum Likelihood 
>>> estimation of a conditional logit model. With the default Nelder-Mead 
>>> algorithm, optimize from the Optim.jl package gave me the same result that 
>>> I had obtained in Stata and Matlab.
>>> 
>>> With gradient-based methods such as BFGS, however, the algorithm jumped 
>>> from the starting values to parameter values that are completely different. 
>>> This happened for all thr starting values I tried, including the case in 
>>> which I took a vector that is closed to the optimum from the Nelder-Mead 
>>> algorithm.  
>>> 
>>> The problem seems to be that the algorithm tried values so large (in 
>>> absolute value) that this caused problems for the objective
>>> function, where I call exponential functions into which these parameter 
>>> values enter. As a result, the optimization based on the BFGS algorithm did 
>>> not produce the expected optimum.
>>> 
>>> While I could try to provide the analytical gradient in this simple case, I 
>>> was planning to use Julia for Maximum Likelihood or Simulated Maximum 
>>> Likelihood estimation in cases where the gradient is more difficult to 
>>> derive, so it would be good if I could make the optimizer run also with 
>>> numerical gradients.
>>> 
>>> I suspect that my problems with optimize from Optim.jl could have something 
>>> to do with the gradient() function. In the example below, for instance, I 
>>> do not understand why the output of the gradient function includes values 
>>> such as 11470.7, given that the function values differ only minimally.
>>> 
>>> Best wishes,
>>> Holger
>>> 
>>> 
>>> julia> Optim.gradient(clogit_ll,zeros(4))
>>> 60554544523933395e-22
>>> 0Op
>>> 0
>>> 0
>>> 
>>> 14923.564009972584
>>> -60554544523933395e-22
>>> 0
>>> 0
>>> 0
>>> 
>>> 14923.565228435104
>>> 0
>>> 60554544523933395e-22
>>> 0
>>> 0
>>> 
>>> 14923.569064311248
>>> 0
>>> -60554544523933395e-22
>>> 0
>>> 0
>>> 
>>> 14923.560174904109
>>> 0
>>> 0
>>> 60554544523933395e-22
>>> 0
>>> 
>>> 14923.63413848258
>>> 0
>>> 0
>>> -60554544523933395e-22
>>> 0
>>> 
>>> 14923.495218282553
>>> 0
>>> 0
>>> 0
>>> 60554544523933395e-22
>>> 
>>> 14923.58699717058
>>> 0
>>> 0
>>> 0
>>> -60554544523933395e-22
>>> 
>>> 14923.54224130672
>>> 4-element Array{Float64,1}:
>>>   -100.609
>>>    734.0
>>>  11470.7
>>>   3695.5
>>> 
>>> function clogit_ll(beta::Vector)
>>> 
>>>     # Print the parameters and the return value to
>>>     # check how gradient() and optimize() work.
>>>     println(beta) 
>>>     println(-sum(compute_ll(beta,T,0)))
>>> 
>>>     # compute_ll computes the individual likelihood contributions
>>>     # in the sample. T is the number of periods in the panel. The 0
>>>     # is not used in this simple example. In related functions, I
>>>     # pass on different values here to estimate finite mixtures of
>>>     # the conditional logit model.
>>>     return -sum(compute_ll(beta,T,0))
>>> end
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> -- 
>>> Med venlig hilsen
>>> 
>>> Andreas Noack Jensen
>> 
>

Re: [julia-users] Optim.jl: unexpected results when using gradient-based methods

Reply via email to