Re: [julia-users] Optim.jl: unexpected results when using gradient-based methods

Holger Stichnoth Thu, 22 May 2014 03:39:07 -0700

@ John: You are right, when I specify the function as 
clogit_ll(beta::Vector) instead of clogit_ll(beta::Vector{Float64}), 
autodiff = true works fine. Thanks for your help!


@ Tim: I have set the rather strict default convergence criteria of 
Optim.optimize to Matlab's default values for fminunc, but the speed 
difference is still there.

@ Miles/John: Getting rid of the global variables through closures and 
devectorizing made the optimization _slower_ not faster in my case: 
https://gist.github.com/stichnoth/7f251ded83dcaa384273. I was surprised to 
see this as I expected a speed increase myself. 
 

Am Mittwoch, 21. Mai 2014 16:48:51 UTC+1 schrieb Miles Lubin:
>
> Just to extend on what John said, also think that if you can restructure 
> the code to devectorize it and avoid using global variables, you'll see 
> *much* better performance. 
>
> The way to avoid globals is by using closures, for example:
> function foo(x, data)
>     ...
> end
>
>
> ...
> data_raw = readcsv(file)
> data = reshape(data_raw, nObs, nChoices*(1+nVar), T)
>
>
>
> Optim.optimize(x-> foo(x,data), ...)
>
>
>
> On Tuesday, May 20, 2014 11:47:39 AM UTC-4, John Myles White wrote:
>>
>> Glad that you were able to figure out the source of your problems.
>>
>> It would be good to get a sense of the amount of time spent inside your 
>> objective function vs. the amount of time spent in the code for optimize(). 
>> In general, my experience is that >>90% of the compute time for an 
>> optimization problem is spent in the objective function itself. If you 
>> instrument your objective function to produce timing information on each 
>> call, that would help a lot since you could then get a sense of how much 
>> time is being spent in the code for optimize() after accounting for your 
>> function itself.
>>
>> It’s also worth keeping in mind that your use of implicit finite 
>> differencing means that your objective function is being called a lot more 
>> times than theoretically necessary, so that any minor performance issue in 
>> it will very substantially slow down the solver.
>>
>> Regarding you objective function’s code, I suspect that the combination 
>> of global variables and memory-allocating vectorized arithmetic means that 
>> your objective function might be a good bit slower in Julia than in Matlab. 
>> Matlab seems to be a little better about garbage collection for vectorized 
>> arithmetic and Julia is generally not able to optimize code involving 
>> global variables.
>>
>> Hope that points you in the right direction.
>>
>>  — John
>>
>> On May 20, 2014, at 8:34 AM, Holger Stichnoth <[email protected]> wrote:
>>
>> Hi Andreas,
>> hi John,
>> hi Miles (via julia-opt, where I mistakenly also posted my question 
>> yesterday),
>>
>> Thanks for your help. Here is the link to the Gist I created: 
>> https://gist.github.com/anonymous/5f95ab1afd241c0a5962
>>
>> In the process of producing a minimal (non-)working example, I discovered 
>> that the unexpected results are due to the truncation of the logit choice 
>> probabilities in the objective function. Optim.optimize() is sensitive to 
>> this when method = :l_bfgs is used. With method = :nelder_mead, everything 
>> works fine. When I comment out the truncation, :l_bfgs works as well. 
>> However, I need to increase the xtol from its default of 1e-12 to at least 
>> 1e-10, otherwise I get the error that the linesearch failed to converge.
>>
>> I guess I should just do without the truncation. The logit probabilities 
>> are between 0 and 1 by construction anyway. I had just copied the 
>> truncation code from a friend who had told me that probabilities that are 
>> too close to 0 or 1 sometimes cause numerical problems in his Matlab code 
>> of the same function. With Optim.optimize(), it seems to be the other way 
>> around, i.e. moving the probabilities further away from 0 or 1 (even by 
>> tiny amounts) means that the stability of the (gradient-based) algorithm is 
>> reduced.
>>
>> So for me, the problem is solved. The problem was not with Optim.jl, but 
>> with my own code.
>>
>> The only other thing that I discovered when trying out Julia and Optim.jl 
>> is that the optimization is currently considerably slower than Matlab's 
>> fminunc. From the Gist I provided above, are there any potential 
>> performance improvements that I am missing out on?
>>
>> Best wishes,
>> Holger
>>
>>
>> On Monday, 19 May 2014 14:51:16 UTC+1, John Myles White wrote:
>>>
>>> If you can, please do share an example of your code. Logit-style models 
>>> are in general numerically unstable, so it would be good to see how exactly 
>>> you’ve coded things up.
>>>
>>> One thing you may be able to do is use automatic differentiation via the 
>>> autodiff = true keyword to optimize, but that assumes that your objective 
>>> function is written in completely pure Julia code (which means, for 
>>> example, that your code must not call any of functions not written in Julia 
>>> provided by Distributions.jl).
>>>
>>>  — John
>>>
>>> On May 19, 2014, at 4:09 AM, Andreas Noack Jensen <
>>> [email protected]> wrote:
>>>
>>> What is the output of versioninfo() and Pkg.installed("Optim")? Also, 
>>> would it be possible to make a gist with your code?
>>>
>>>
>>> 2014-05-19 12:44 GMT+02:00 Holger Stichnoth <[email protected]>:
>>>
>>>>  Hello,
>>>>
>>>> I installed Julia a couple of days ago and was impressed how easy it 
>>>> was to make the switch from Matlab and to parallelize my code
>>>> (something I had never done before in any language; I'm an economist 
>>>> with only limited programming experience, mainly in Stata and Matlab).
>>>>
>>>> However, I ran into a problem when using Optim.jl for Maximum 
>>>> Likelihood estimation of a conditional logit model. With the default 
>>>> Nelder-Mead algorithm, optimize from the Optim.jl package gave me the same 
>>>> result that I had obtained in Stata and Matlab.
>>>>
>>>> With gradient-based methods such as BFGS, however, the algorithm jumped 
>>>> from the starting values to parameter values that are completely 
>>>> different. 
>>>> This happened for all thr starting values I tried, including the case in 
>>>> which I took a vector that is closed to the optimum from the Nelder-Mead 
>>>> algorithm.  
>>>>
>>>> The problem seems to be that the algorithm tried values so large (in 
>>>> absolute value) that this caused problems for the objective
>>>> function, where I call exponential functions into which these parameter 
>>>> values enter. As a result, the optimization based on the BFGS algorithm 
>>>> did 
>>>> not produce the expected optimum.
>>>>
>>>> While I could try to provide the analytical gradient in this simple 
>>>> case, I was planning to use Julia for Maximum Likelihood or Simulated 
>>>> Maximum Likelihood estimation in cases where the gradient is more 
>>>> difficult 
>>>> to derive, so it would be good if I could make the optimizer run also with 
>>>> numerical gradients.
>>>>
>>>> I suspect that my problems with optimize from Optim.jl could have 
>>>> something to do with the gradient() function. In the example below, for 
>>>> instance, I do not understand why the output of the gradient function 
>>>> includes values such as 11470.7, given that the function values differ 
>>>> only 
>>>> minimally.
>>>>
>>>> Best wishes,
>>>> Holger
>>>>
>>>>
>>>> julia> Optim.gradient(clogit_ll,zeros(4))
>>>> 60554544523933395e-22
>>>> 0Op
>>>> 0
>>>> 0
>>>>
>>>> 14923.564009972584
>>>> -60554544523933395e-22
>>>> 0
>>>> 0
>>>> 0
>>>>
>>>> 14923.565228435104
>>>> 0
>>>> 60554544523933395e-22
>>>> 0
>>>> 0
>>>>
>>>> 14923.569064311248
>>>> 0
>>>> -60554544523933395e-22
>>>> 0
>>>> 0
>>>>
>>>> 14923.560174904109
>>>> 0
>>>> 0
>>>> 60554544523933395e-22
>>>> 0
>>>>
>>>> 14923.63413848258
>>>> 0
>>>> 0
>>>> -60554544523933395e-22
>>>> 0
>>>>
>>>> 14923.495218282553
>>>> 0
>>>> 0
>>>> 0
>>>> 60554544523933395e-22
>>>>
>>>> 14923.58699717058
>>>> 0
>>>> 0
>>>> 0
>>>> -60554544523933395e-22
>>>>
>>>> 14923.54224130672
>>>> 4-element Array{Float64,1}:
>>>>   -100.609
>>>>    734.0
>>>>  11470.7
>>>>   3695.5
>>>>
>>>> function clogit_ll(beta::Vector)
>>>>
>>>>     # Print the parameters and the return value to
>>>>     # check how gradient() and optimize() work.
>>>>     println(beta) 
>>>>     println(-sum(compute_ll(beta,T,0)))
>>>>
>>>>     # compute_ll computes the individual likelihood contributions
>>>>     # in the sample. T is the number of periods in the panel. The 0
>>>>     # is not used in this simple example. In related functions, I
>>>>     # pass on different values here to estimate finite mixtures of
>>>>     # the conditional logit model.
>>>>     return -sum(compute_ll(beta,T,0))
>>>> end
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>> -- 
>>> Med venlig hilsen
>>>
>>> Andreas Noack Jensen
>>>  
>>>
>>>
>>

Re: [julia-users] Optim.jl: unexpected results when using gradient-based methods

Reply via email to