Re: [julia-users] Optim.jl: unexpected results when using gradient-based methods

Holger Stichnoth Thu, 22 May 2014 06:54:01 -0700

Thanks, it's faster now (by roughly a factor of 3 on my computer), but 
still considerably slower than fminunc:


Averages over 20 runs:
Julia/Optim.optimize: 10.5s
Matlab/fminunc: 2.6s

Here are my Matlab settings:
options = optimset('Display', 'iter', ...
     'MaxIter', 2500, 'MaxFunEvals', 500000, ...
     'TolFun', 1e-6, 'TolX', 1e-6, ...
     'GradObj', 'off', 'DerivativeCheck', 'off');

startb      = ones(1,nVar)';
[estim_clo, ll_clo]= ...
     fminunc(@(param)clogit_ll(param,data), ...
     startb,options);

Could the speed issue be related to the following messages that I get when 
I run the Julia code?
C:\Users\User\Documents\References\Software\Julia\mlubin>julia main.jl
Warning: could not import Base.foldl into NumericExtensions
Warning: could not import Base.foldr into NumericExtensions
Warning: could not import Base.sum! into NumericExtensions
Warning: could not import Base.maximum! into NumericExtensions
Warning: could not import Base.minimum! into NumericExtensions




Am Donnerstag, 22. Mai 2014 14:18:36 UTC+1 schrieb Miles Lubin:
>
> I was able to get a nearly 5x speedup by avoiding the matrix allocation 
> and making the accumulators type stable: 
> https://gist.github.com/mlubin/055690ddf2466e98bba6
>
> How does this compare with Matlab now?
>
> On Thursday, May 22, 2014 6:38:44 AM UTC-4, Holger Stichnoth wrote:
>>
>> @ John: You are right, when I specify the function as 
>> clogit_ll(beta::Vector) instead of clogit_ll(beta::Vector{Float64}), 
>> autodiff = true works fine. Thanks for your help!
>>
>> @ Tim: I have set the rather strict default convergence criteria of 
>> Optim.optimize to Matlab's default values for fminunc, but the speed 
>> difference is still there.
>>
>> @ Miles/John: Getting rid of the global variables through closures and 
>> devectorizing made the optimization _slower_ not faster in my case: 
>> https://gist.github.com/stichnoth/7f251ded83dcaa384273. I was surprised 
>> to see this as I expected a speed increase myself. 
>>  
>>
>> Am Mittwoch, 21. Mai 2014 16:48:51 UTC+1 schrieb Miles Lubin:
>>>
>>> Just to extend on what John said, also think that if you can restructure 
>>> the code to devectorize it and avoid using global variables, you'll see 
>>> *much* better performance. 
>>>
>>> The way to avoid globals is by using closures, for example:
>>> function foo(x, data)
>>>     ...
>>> end
>>>
>>>
>>> ...
>>> data_raw = readcsv(file)
>>> data = reshape(data_raw, nObs, nChoices*(1+nVar), T)
>>>
>>>
>>>
>>> Optim.optimize(x-> foo(x,data), ...)
>>>
>>>
>>>
>>> On Tuesday, May 20, 2014 11:47:39 AM UTC-4, John Myles White wrote:
>>>>
>>>> Glad that you were able to figure out the source of your problems.
>>>>
>>>> It would be good to get a sense of the amount of time spent inside your 
>>>> objective function vs. the amount of time spent in the code for 
>>>> optimize(). 
>>>> In general, my experience is that >>90% of the compute time for an 
>>>> optimization problem is spent in the objective function itself. If you 
>>>> instrument your objective function to produce timing information on each 
>>>> call, that would help a lot since you could then get a sense of how much 
>>>> time is being spent in the code for optimize() after accounting for your 
>>>> function itself.
>>>>
>>>> It’s also worth keeping in mind that your use of implicit finite 
>>>> differencing means that your objective function is being called a lot more 
>>>> times than theoretically necessary, so that any minor performance issue in 
>>>> it will very substantially slow down the solver.
>>>>
>>>> Regarding you objective function’s code, I suspect that the combination 
>>>> of global variables and memory-allocating vectorized arithmetic means that 
>>>> your objective function might be a good bit slower in Julia than in 
>>>> Matlab. 
>>>> Matlab seems to be a little better about garbage collection for vectorized 
>>>> arithmetic and Julia is generally not able to optimize code involving 
>>>> global variables.
>>>>
>>>> Hope that points you in the right direction.
>>>>
>>>>  — John
>>>>
>>>> On May 20, 2014, at 8:34 AM, Holger Stichnoth <[email protected]> 
>>>> wrote:
>>>>
>>>> Hi Andreas,
>>>> hi John,
>>>> hi Miles (via julia-opt, where I mistakenly also posted my question 
>>>> yesterday),
>>>>
>>>> Thanks for your help. Here is the link to the Gist I created: 
>>>> https://gist.github.com/anonymous/5f95ab1afd241c0a5962
>>>>
>>>> In the process of producing a minimal (non-)working example, I 
>>>> discovered that the unexpected results are due to the truncation of the 
>>>> logit choice probabilities in the objective function. Optim.optimize() is 
>>>> sensitive to this when method = :l_bfgs is used. With method = 
>>>> :nelder_mead, everything works fine. When I comment out the truncation, 
>>>> :l_bfgs works as well. However, I need to increase the xtol from its 
>>>> default of 1e-12 to at least 1e-10, otherwise I get the error that the 
>>>> linesearch failed to converge.
>>>>
>>>> I guess I should just do without the truncation. The logit 
>>>> probabilities are between 0 and 1 by construction anyway. I had just 
>>>> copied 
>>>> the truncation code from a friend who had told me that probabilities that 
>>>> are too close to 0 or 1 sometimes cause numerical problems in his Matlab 
>>>> code of the same function. With Optim.optimize(), it seems to be the other 
>>>> way around, i.e. moving the probabilities further away from 0 or 1 (even 
>>>> by 
>>>> tiny amounts) means that the stability of the (gradient-based) algorithm 
>>>> is 
>>>> reduced.
>>>>
>>>> So for me, the problem is solved. The problem was not with Optim.jl, 
>>>> but with my own code.
>>>>
>>>> The only other thing that I discovered when trying out Julia and 
>>>> Optim.jl is that the optimization is currently considerably slower than 
>>>> Matlab's fminunc. From the Gist I provided above, are there any potential 
>>>> performance improvements that I am missing out on?
>>>>
>>>> Best wishes,
>>>> Holger
>>>>
>>>>
>>>> On Monday, 19 May 2014 14:51:16 UTC+1, John Myles White wrote:
>>>>>
>>>>> If you can, please do share an example of your code. Logit-style 
>>>>> models are in general numerically unstable, so it would be good to see 
>>>>> how 
>>>>> exactly you’ve coded things up.
>>>>>
>>>>> One thing you may be able to do is use automatic differentiation via 
>>>>> the autodiff = true keyword to optimize, but that assumes that your 
>>>>> objective function is written in completely pure Julia code (which means, 
>>>>> for example, that your code must not call any of functions not written in 
>>>>> Julia provided by Distributions.jl).
>>>>>
>>>>>  — John
>>>>>
>>>>> On May 19, 2014, at 4:09 AM, Andreas Noack Jensen <
>>>>> [email protected]> wrote:
>>>>>
>>>>> What is the output of versioninfo() and Pkg.installed("Optim")? Also, 
>>>>> would it be possible to make a gist with your code?
>>>>>
>>>>>
>>>>> 2014-05-19 12:44 GMT+02:00 Holger Stichnoth <[email protected]>:
>>>>>
>>>>>>  Hello,
>>>>>>
>>>>>> I installed Julia a couple of days ago and was impressed how easy it 
>>>>>> was to make the switch from Matlab and to parallelize my code
>>>>>> (something I had never done before in any language; I'm an economist 
>>>>>> with only limited programming experience, mainly in Stata and Matlab).
>>>>>>
>>>>>> However, I ran into a problem when using Optim.jl for Maximum 
>>>>>> Likelihood estimation of a conditional logit model. With the default 
>>>>>> Nelder-Mead algorithm, optimize from the Optim.jl package gave me the 
>>>>>> same 
>>>>>> result that I had obtained in Stata and Matlab.
>>>>>>
>>>>>> With gradient-based methods such as BFGS, however, the algorithm 
>>>>>> jumped from the starting values to parameter values that are completely 
>>>>>> different. This happened for all thr starting values I tried, including 
>>>>>> the 
>>>>>> case in which I took a vector that is closed to the optimum from the 
>>>>>> Nelder-Mead algorithm.  
>>>>>>
>>>>>> The problem seems to be that the algorithm tried values so large (in 
>>>>>> absolute value) that this caused problems for the objective
>>>>>> function, where I call exponential functions into which these 
>>>>>> parameter values enter. As a result, the optimization based on the BFGS 
>>>>>> algorithm did not produce the expected optimum.
>>>>>>
>>>>>> While I could try to provide the analytical gradient in this simple 
>>>>>> case, I was planning to use Julia for Maximum Likelihood or Simulated 
>>>>>> Maximum Likelihood estimation in cases where the gradient is more 
>>>>>> difficult 
>>>>>> to derive, so it would be good if I could make the optimizer run also 
>>>>>> with 
>>>>>> numerical gradients.
>>>>>>
>>>>>> I suspect that my problems with optimize from Optim.jl could have 
>>>>>> something to do with the gradient() function. In the example below, for 
>>>>>> instance, I do not understand why the output of the gradient function 
>>>>>> includes values such as 11470.7, given that the function values differ 
>>>>>> only 
>>>>>> minimally.
>>>>>>
>>>>>> Best wishes,
>>>>>> Holger
>>>>>>
>>>>>>
>>>>>> julia> Optim.gradient(clogit_ll,zeros(4))
>>>>>> 60554544523933395e-22
>>>>>> 0Op
>>>>>> 0
>>>>>> 0
>>>>>>
>>>>>> 14923.564009972584
>>>>>> -60554544523933395e-22
>>>>>> 0
>>>>>> 0
>>>>>> 0
>>>>>>
>>>>>> 14923.565228435104
>>>>>> 0
>>>>>> 60554544523933395e-22
>>>>>> 0
>>>>>> 0
>>>>>>
>>>>>> 14923.569064311248
>>>>>> 0
>>>>>> -60554544523933395e-22
>>>>>> 0
>>>>>> 0
>>>>>>
>>>>>> 14923.560174904109
>>>>>> 0
>>>>>> 0
>>>>>> 60554544523933395e-22
>>>>>> 0
>>>>>>
>>>>>> 14923.63413848258
>>>>>> 0
>>>>>> 0
>>>>>> -60554544523933395e-22
>>>>>> 0
>>>>>>
>>>>>> 14923.495218282553
>>>>>> 0
>>>>>> 0
>>>>>> 0
>>>>>> 60554544523933395e-22
>>>>>>
>>>>>> 14923.58699717058
>>>>>> 0
>>>>>> 0
>>>>>> 0
>>>>>> -60554544523933395e-22
>>>>>>
>>>>>> 14923.54224130672
>>>>>> 4-element Array{Float64,1}:
>>>>>>   -100.609
>>>>>>    734.0
>>>>>>  11470.7
>>>>>>   3695.5
>>>>>>
>>>>>> function clogit_ll(beta::Vector)
>>>>>>
>>>>>>     # Print the parameters and the return value to
>>>>>>     # check how gradient() and optimize() work.
>>>>>>     println(beta) 
>>>>>>     println(-sum(compute_ll(beta,T,0)))
>>>>>>
>>>>>>     # compute_ll computes the individual likelihood contributions
>>>>>>     # in the sample. T is the number of periods in the panel. The 0
>>>>>>     # is not used in this simple example. In related functions, I
>>>>>>     # pass on different values here to estimate finite mixtures of
>>>>>>     # the conditional logit model.
>>>>>>     return -sum(compute_ll(beta,T,0))
>>>>>> end
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> -- 
>>>>> Med venlig hilsen
>>>>>
>>>>> Andreas Noack Jensen
>>>>>  
>>>>>
>>>>>
>>>>

Re: [julia-users] Optim.jl: unexpected results when using gradient-based methods

Reply via email to