[julia-users] Optim.jl: unexpected results when using gradient-based methods

Holger Stichnoth Mon, 19 May 2014 03:59:18 -0700

 Hello,

I installed Julia a couple of days ago and was impressed how easy it was to 
make the switch from Matlab and to parallelize my code
(something I had never done before in any language; I'm an economist with 
only limited programming experience, mainly in Stata and Matlab).


However, I ran into a problem when using Optim.jl for Maximum Likelihood 
estimation of a conditional logit model. With the default Nelder-Mead 
algorithm, optimize from the Optim.jl package gave me the same result that 
I had obtained in Stata and Matlab.

With gradient-based methods such as BFGS, however, the algorithm jumped 
from the starting values to parameter values that are completely different. 
This happened for all thr starting values I tried, including the case in 
which I took a vector that is closed to the optimum from the Nelder-Mead 
algorithm.  

The problem seems to be that the algorithm tried values so large (in 
absolute value) that this caused problems for the objective
function, where I call exponential functions into which these parameter 
values enter. As a result, the optimization based on the BFGS algorithm did 
not produce the expected optimum.

While I could try to provide the analytical gradient in this simple case, I 
was planning to use Julia for Maximum Likelihood or Simulated Maximum 
Likelihood estimation in cases where the gradient is more difficult to 
derive, so it would be good if I could make the optimizer run also with 
numerical gradients.

I suspect that my problems with optimize from Optim.jl could have something 
to do with the gradient() function. In the example below, for instance, I 
do not understand why the output of the gradient function includes values 
such as 11470.7, given that the function values differ only minimally.

Best wishes,
Holger


julia> Optim.gradient(clogit_ll,zeros(4))
60554544523933395e-22
0Op
0
0

14923.564009972584
-60554544523933395e-22
0
0
0

14923.565228435104
0
60554544523933395e-22
0
0

14923.569064311248
0
-60554544523933395e-22
0
0

14923.560174904109
0
0
60554544523933395e-22
0

14923.63413848258
0
0
-60554544523933395e-22
0

14923.495218282553
0
0
0
60554544523933395e-22

14923.58699717058
0
0
0
-60554544523933395e-22

14923.54224130672
4-element Array{Float64,1}:
  -100.609
   734.0
 11470.7
  3695.5

function clogit_ll(beta::Vector)

    # Print the parameters and the return value to
    # check how gradient() and optimize() work.
    println(beta) 
    println(-sum(compute_ll(beta,T,0)))

    # compute_ll computes the individual likelihood contributions
    # in the sample. T is the number of periods in the panel. The 0
    # is not used in this simple example. In related functions, I
    # pass on different values here to estimate finite mixtures of
    # the conditional logit model.
    return -sum(compute_ll(beta,T,0))
end

[julia-users] Optim.jl: unexpected results when using gradient-based methods

Reply via email to