Re: [julia-users] Optim.jl: unexpected results when using gradient-based methods

John Myles White Mon, 19 May 2014 06:51:39 -0700

If you can, please do share an example of your code. Logit-style models are in 
general numerically unstable, so it would be good to see how exactly you’ve 
coded things up.


One thing you may be able to do is use automatic differentiation via the 
autodiff = true keyword to optimize, but that assumes that your objective 
function is written in completely pure Julia code (which means, for example, 
that your code must not call any of functions not written in Julia provided by 
Distributions.jl).

 — John

On May 19, 2014, at 4:09 AM, Andreas Noack Jensen 
<[email protected]> wrote:

> What is the output of versioninfo() and Pkg.installed("Optim")? Also, would 
> it be possible to make a gist with your code?
> 
> 
> 2014-05-19 12:44 GMT+02:00 Holger Stichnoth <[email protected]>:
>  Hello,
> 
> I installed Julia a couple of days ago and was impressed how easy it was to 
> make the switch from Matlab and to parallelize my code
> (something I had never done before in any language; I'm an economist with 
> only limited programming experience, mainly in Stata and Matlab).
> 
> However, I ran into a problem when using Optim.jl for Maximum Likelihood 
> estimation of a conditional logit model. With the default Nelder-Mead 
> algorithm, optimize from the Optim.jl package gave me the same result that I 
> had obtained in Stata and Matlab.
> 
> With gradient-based methods such as BFGS, however, the algorithm jumped from 
> the starting values to parameter values that are completely different. This 
> happened for all thr starting values I tried, including the case in which I 
> took a vector that is closed to the optimum from the Nelder-Mead algorithm.  
> 
> The problem seems to be that the algorithm tried values so large (in absolute 
> value) that this caused problems for the objective
> function, where I call exponential functions into which these parameter 
> values enter. As a result, the optimization based on the BFGS algorithm did 
> not produce the expected optimum.
> 
> While I could try to provide the analytical gradient in this simple case, I 
> was planning to use Julia for Maximum Likelihood or Simulated Maximum 
> Likelihood estimation in cases where the gradient is more difficult to 
> derive, so it would be good if I could make the optimizer run also with 
> numerical gradients.
> 
> I suspect that my problems with optimize from Optim.jl could have something 
> to do with the gradient() function. In the example below, for instance, I do 
> not understand why the output of the gradient function includes values such 
> as 11470.7, given that the function values differ only minimally.
> 
> Best wishes,
> Holger
> 
> 
> julia> Optim.gradient(clogit_ll,zeros(4))
> 60554544523933395e-22
> 0Op
> 0
> 0
> 
> 14923.564009972584
> -60554544523933395e-22
> 0
> 0
> 0
> 
> 14923.565228435104
> 0
> 60554544523933395e-22
> 0
> 0
> 
> 14923.569064311248
> 0
> -60554544523933395e-22
> 0
> 0
> 
> 14923.560174904109
> 0
> 0
> 60554544523933395e-22
> 0
> 
> 14923.63413848258
> 0
> 0
> -60554544523933395e-22
> 0
> 
> 14923.495218282553
> 0
> 0
> 0
> 60554544523933395e-22
> 
> 14923.58699717058
> 0
> 0
> 0
> -60554544523933395e-22
> 
> 14923.54224130672
> 4-element Array{Float64,1}:
>   -100.609
>    734.0
>  11470.7
>   3695.5
> 
> function clogit_ll(beta::Vector)
> 
>     # Print the parameters and the return value to
>     # check how gradient() and optimize() work.
>     println(beta) 
>     println(-sum(compute_ll(beta,T,0)))
> 
>     # compute_ll computes the individual likelihood contributions
>     # in the sample. T is the number of periods in the panel. The 0
>     # is not used in this simple example. In related functions, I
>     # pass on different values here to estimate finite mixtures of
>     # the conditional logit model.
>     return -sum(compute_ll(beta,T,0))
> end
> 
> 
> 
> 
> 
> 
> 
> 
> -- 
> Med venlig hilsen
> 
> Andreas Noack Jensen

Re: [julia-users] Optim.jl: unexpected results when using gradient-based methods

Reply via email to