I was able to get a nearly 5x speedup by avoiding the matrix allocation and
making the accumulators type
stable: https://gist.github.com/mlubin/055690ddf2466e98bba6
How does this compare with Matlab now?
On Thursday, May 22, 2014 6:38:44 AM UTC-4, Holger Stichnoth wrote:
>
> @ John: You are right, when I specify the function as
> clogit_ll(beta::Vector) instead of clogit_ll(beta::Vector{Float64}),
> autodiff = true works fine. Thanks for your help!
>
> @ Tim: I have set the rather strict default convergence criteria of
> Optim.optimize to Matlab's default values for fminunc, but the speed
> difference is still there.
>
> @ Miles/John: Getting rid of the global variables through closures and
> devectorizing made the optimization _slower_ not faster in my case:
> https://gist.github.com/stichnoth/7f251ded83dcaa384273. I was surprised
> to see this as I expected a speed increase myself.
>
>
> Am Mittwoch, 21. Mai 2014 16:48:51 UTC+1 schrieb Miles Lubin:
>>
>> Just to extend on what John said, also think that if you can restructure
>> the code to devectorize it and avoid using global variables, you'll see
>> *much* better performance.
>>
>> The way to avoid globals is by using closures, for example:
>> function foo(x, data)
>> ...
>> end
>>
>>
>> ...
>> data_raw = readcsv(file)
>> data = reshape(data_raw, nObs, nChoices*(1+nVar), T)
>>
>>
>>
>> Optim.optimize(x-> foo(x,data), ...)
>>
>>
>>
>> On Tuesday, May 20, 2014 11:47:39 AM UTC-4, John Myles White wrote:
>>>
>>> Glad that you were able to figure out the source of your problems.
>>>
>>> It would be good to get a sense of the amount of time spent inside your
>>> objective function vs. the amount of time spent in the code for optimize().
>>> In general, my experience is that >>90% of the compute time for an
>>> optimization problem is spent in the objective function itself. If you
>>> instrument your objective function to produce timing information on each
>>> call, that would help a lot since you could then get a sense of how much
>>> time is being spent in the code for optimize() after accounting for your
>>> function itself.
>>>
>>> It’s also worth keeping in mind that your use of implicit finite
>>> differencing means that your objective function is being called a lot more
>>> times than theoretically necessary, so that any minor performance issue in
>>> it will very substantially slow down the solver.
>>>
>>> Regarding you objective function’s code, I suspect that the combination
>>> of global variables and memory-allocating vectorized arithmetic means that
>>> your objective function might be a good bit slower in Julia than in Matlab.
>>> Matlab seems to be a little better about garbage collection for vectorized
>>> arithmetic and Julia is generally not able to optimize code involving
>>> global variables.
>>>
>>> Hope that points you in the right direction.
>>>
>>> — John
>>>
>>> On May 20, 2014, at 8:34 AM, Holger Stichnoth <[email protected]> wrote:
>>>
>>> Hi Andreas,
>>> hi John,
>>> hi Miles (via julia-opt, where I mistakenly also posted my question
>>> yesterday),
>>>
>>> Thanks for your help. Here is the link to the Gist I created:
>>> https://gist.github.com/anonymous/5f95ab1afd241c0a5962
>>>
>>> In the process of producing a minimal (non-)working example, I
>>> discovered that the unexpected results are due to the truncation of the
>>> logit choice probabilities in the objective function. Optim.optimize() is
>>> sensitive to this when method = :l_bfgs is used. With method =
>>> :nelder_mead, everything works fine. When I comment out the truncation,
>>> :l_bfgs works as well. However, I need to increase the xtol from its
>>> default of 1e-12 to at least 1e-10, otherwise I get the error that the
>>> linesearch failed to converge.
>>>
>>> I guess I should just do without the truncation. The logit probabilities
>>> are between 0 and 1 by construction anyway. I had just copied the
>>> truncation code from a friend who had told me that probabilities that are
>>> too close to 0 or 1 sometimes cause numerical problems in his Matlab code
>>> of the same function. With Optim.optimize(), it seems to be the other way
>>> around, i.e. moving the probabilities further away from 0 or 1 (even by
>>> tiny amounts) means that the stability of the (gradient-based) algorithm is
>>> reduced.
>>>
>>> So for me, the problem is solved. The problem was not with Optim.jl, but
>>> with my own code.
>>>
>>> The only other thing that I discovered when trying out Julia and
>>> Optim.jl is that the optimization is currently considerably slower than
>>> Matlab's fminunc. From the Gist I provided above, are there any potential
>>> performance improvements that I am missing out on?
>>>
>>> Best wishes,
>>> Holger
>>>
>>>
>>> On Monday, 19 May 2014 14:51:16 UTC+1, John Myles White wrote:
>>>>
>>>> If you can, please do share an example of your code. Logit-style models
>>>> are in general numerically unstable, so it would be good to see how
>>>> exactly
>>>> you’ve coded things up.
>>>>
>>>> One thing you may be able to do is use automatic differentiation via
>>>> the autodiff = true keyword to optimize, but that assumes that your
>>>> objective function is written in completely pure Julia code (which means,
>>>> for example, that your code must not call any of functions not written in
>>>> Julia provided by Distributions.jl).
>>>>
>>>> — John
>>>>
>>>> On May 19, 2014, at 4:09 AM, Andreas Noack Jensen <
>>>> [email protected]> wrote:
>>>>
>>>> What is the output of versioninfo() and Pkg.installed("Optim")? Also,
>>>> would it be possible to make a gist with your code?
>>>>
>>>>
>>>> 2014-05-19 12:44 GMT+02:00 Holger Stichnoth <[email protected]>:
>>>>
>>>>> Hello,
>>>>>
>>>>> I installed Julia a couple of days ago and was impressed how easy it
>>>>> was to make the switch from Matlab and to parallelize my code
>>>>> (something I had never done before in any language; I'm an economist
>>>>> with only limited programming experience, mainly in Stata and Matlab).
>>>>>
>>>>> However, I ran into a problem when using Optim.jl for Maximum
>>>>> Likelihood estimation of a conditional logit model. With the default
>>>>> Nelder-Mead algorithm, optimize from the Optim.jl package gave me the
>>>>> same
>>>>> result that I had obtained in Stata and Matlab.
>>>>>
>>>>> With gradient-based methods such as BFGS, however, the algorithm
>>>>> jumped from the starting values to parameter values that are completely
>>>>> different. This happened for all thr starting values I tried, including
>>>>> the
>>>>> case in which I took a vector that is closed to the optimum from the
>>>>> Nelder-Mead algorithm.
>>>>>
>>>>> The problem seems to be that the algorithm tried values so large (in
>>>>> absolute value) that this caused problems for the objective
>>>>> function, where I call exponential functions into which these
>>>>> parameter values enter. As a result, the optimization based on the BFGS
>>>>> algorithm did not produce the expected optimum.
>>>>>
>>>>> While I could try to provide the analytical gradient in this simple
>>>>> case, I was planning to use Julia for Maximum Likelihood or Simulated
>>>>> Maximum Likelihood estimation in cases where the gradient is more
>>>>> difficult
>>>>> to derive, so it would be good if I could make the optimizer run also
>>>>> with
>>>>> numerical gradients.
>>>>>
>>>>> I suspect that my problems with optimize from Optim.jl could have
>>>>> something to do with the gradient() function. In the example below, for
>>>>> instance, I do not understand why the output of the gradient function
>>>>> includes values such as 11470.7, given that the function values differ
>>>>> only
>>>>> minimally.
>>>>>
>>>>> Best wishes,
>>>>> Holger
>>>>>
>>>>>
>>>>> julia> Optim.gradient(clogit_ll,zeros(4))
>>>>> 60554544523933395e-22
>>>>> 0Op
>>>>> 0
>>>>> 0
>>>>>
>>>>> 14923.564009972584
>>>>> -60554544523933395e-22
>>>>> 0
>>>>> 0
>>>>> 0
>>>>>
>>>>> 14923.565228435104
>>>>> 0
>>>>> 60554544523933395e-22
>>>>> 0
>>>>> 0
>>>>>
>>>>> 14923.569064311248
>>>>> 0
>>>>> -60554544523933395e-22
>>>>> 0
>>>>> 0
>>>>>
>>>>> 14923.560174904109
>>>>> 0
>>>>> 0
>>>>> 60554544523933395e-22
>>>>> 0
>>>>>
>>>>> 14923.63413848258
>>>>> 0
>>>>> 0
>>>>> -60554544523933395e-22
>>>>> 0
>>>>>
>>>>> 14923.495218282553
>>>>> 0
>>>>> 0
>>>>> 0
>>>>> 60554544523933395e-22
>>>>>
>>>>> 14923.58699717058
>>>>> 0
>>>>> 0
>>>>> 0
>>>>> -60554544523933395e-22
>>>>>
>>>>> 14923.54224130672
>>>>> 4-element Array{Float64,1}:
>>>>> -100.609
>>>>> 734.0
>>>>> 11470.7
>>>>> 3695.5
>>>>>
>>>>> function clogit_ll(beta::Vector)
>>>>>
>>>>> # Print the parameters and the return value to
>>>>> # check how gradient() and optimize() work.
>>>>> println(beta)
>>>>> println(-sum(compute_ll(beta,T,0)))
>>>>>
>>>>> # compute_ll computes the individual likelihood contributions
>>>>> # in the sample. T is the number of periods in the panel. The 0
>>>>> # is not used in this simple example. In related functions, I
>>>>> # pass on different values here to estimate finite mixtures of
>>>>> # the conditional logit model.
>>>>> return -sum(compute_ll(beta,T,0))
>>>>> end
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Med venlig hilsen
>>>>
>>>> Andreas Noack Jensen
>>>>
>>>>
>>>>
>>>