Hello,
I installed Julia a couple of days ago and was impressed how easy it was to
make the switch from Matlab and to parallelize my code
(something I had never done before in any language; I'm an economist with
only limited programming experience, mainly in Stata and Matlab).
However, I ran into a problem when using Optim.jl for Maximum Likelihood
estimation of a conditional logit model. With the default Nelder-Mead
algorithm, optimize from the Optim.jl package gave me the same result that
I had obtained in Stata and Matlab.
With gradient-based methods such as BFGS, however, the algorithm jumped
from the starting values to parameter values that are completely different.
This happened for all thr starting values I tried, including the case in
which I took a vector that is closed to the optimum from the Nelder-Mead
algorithm.
The problem seems to be that the algorithm tried values so large (in
absolute value) that this caused problems for the objective
function, where I call exponential functions into which these parameter
values enter. As a result, the optimization based on the BFGS algorithm did
not produce the expected optimum.
While I could try to provide the analytical gradient in this simple case, I
was planning to use Julia for Maximum Likelihood or Simulated Maximum
Likelihood estimation in cases where the gradient is more difficult to
derive, so it would be good if I could make the optimizer run also with
numerical gradients.
I suspect that my problems with optimize from Optim.jl could have something
to do with the gradient() function. In the example below, for instance, I
do not understand why the output of the gradient function includes values
such as 11470.7, given that the function values differ only minimally.
Best wishes,
Holger
julia> Optim.gradient(clogit_ll,zeros(4))
60554544523933395e-22
0Op
0
0
14923.564009972584
-60554544523933395e-22
0
0
0
14923.565228435104
0
60554544523933395e-22
0
0
14923.569064311248
0
-60554544523933395e-22
0
0
14923.560174904109
0
0
60554544523933395e-22
0
14923.63413848258
0
0
-60554544523933395e-22
0
14923.495218282553
0
0
0
60554544523933395e-22
14923.58699717058
0
0
0
-60554544523933395e-22
14923.54224130672
4-element Array{Float64,1}:
-100.609
734.0
11470.7
3695.5
function clogit_ll(beta::Vector)
# Print the parameters and the return value to
# check how gradient() and optimize() work.
println(beta)
println(-sum(compute_ll(beta,T,0)))
# compute_ll computes the individual likelihood contributions
# in the sample. T is the number of periods in the panel. The 0
# is not used in this simple example. In related functions, I
# pass on different values here to estimate finite mixtures of
# the conditional logit model.
return -sum(compute_ll(beta,T,0))
end