Hi John, hi Miles, Thanks to both of you. I did not have time to look into this over the weekend; I will do so in the next couple of days. I have already uploaded the Matlab files for comparison: https://gist.github.com/stichnoth/7f251ded83dcaa384273
Holger On Thursday, 22 May 2014 23:03:58 UTC+1, John Myles White wrote: > > Yeah, this case is tricky enough that we really need to get down to the > lowest details: > > (1) Do Julia and Matlab perform similar numbers of function evaluations? > > (2) If they don't perform similar numbers of function evaluations, is one > of them producing a better solution? Is the one that's producing a better > solution doing more function evaluations? > > (3) If they're doing different numbers of function evaluations and the one > that does _fewer_ evaluations also produces a better solution, what's the > reason? For example, is our line search default less effective for this > problem than the Matlab line search? If you try other line search > algorithms, do the results stay the same? > > Unfortunately, answering all of these reliably make take us all pretty far > down the rabbit hole. But they're worth pushing on systematically. > > -- John > > On May 22, 2014, at 2:59 PM, Miles Lubin <[email protected] <javascript:>> > wrote: > > I can get another 50% speedup by: > > - Running the optimization twice and timing the second run only, this is > the more appropriate way to benchmark julia because it excludes the > function compilation time > - Setting autodiff=true > - Breaking up the long chains of sums, apparently these seem to be slow > > At this point one really needs to compare the number of function > evaluations in each method, as John suggested. > > On Thursday, May 22, 2014 9:53:36 AM UTC-4, Holger Stichnoth wrote: >> >> Thanks, it's faster now (by roughly a factor of 3 on my computer), but >> still considerably slower than fminunc: >> >> Averages over 20 runs: >> Julia/Optim.optimize: 10.5s >> Matlab/fminunc: 2.6s >> >> Here are my Matlab settings: >> options = optimset('Display', 'iter', ... >> 'MaxIter', 2500, 'MaxFunEvals', 500000, ... >> 'TolFun', 1e-6, 'TolX', 1e-6, ... >> 'GradObj', 'off', 'DerivativeCheck', 'off'); >> >> startb = ones(1,nVar)'; >> [estim_clo, ll_clo]= ... >> fminunc(@(param)clogit_ll(param,data), ... >> startb,options); >> >> Could the speed issue be related to the following messages that I get >> when I run the Julia code? >> C:\Users\User\Documents\References\Software\Julia\mlubin>julia main.jl >> Warning: could not import Base.foldl into NumericExtensions >> Warning: could not import Base.foldr into NumericExtensions >> Warning: could not import Base.sum! into NumericExtensions >> Warning: could not import Base.maximum! into NumericExtensions >> Warning: could not import Base.minimum! into NumericExtensions >> >> >> >> >> Am Donnerstag, 22. Mai 2014 14:18:36 UTC+1 schrieb Miles Lubin: >>> >>> I was able to get a nearly 5x speedup by avoiding the matrix allocation >>> and making the accumulators type stable: >>> https://gist.github.com/mlubin/055690ddf2466e98bba6 >>> >>> How does this compare with Matlab now? >>> >>> On Thursday, May 22, 2014 6:38:44 AM UTC-4, Holger Stichnoth wrote: >>>> >>>> @ John: You are right, when I specify the function as >>>> clogit_ll(beta::Vector) instead of clogit_ll(beta::Vector{Float64}), >>>> autodiff = true works fine. Thanks for your help! >>>> >>>> @ Tim: I have set the rather strict default convergence criteria of >>>> Optim.optimize to Matlab's default values for fminunc, but the speed >>>> difference is still there. >>>> >>>> @ Miles/John: Getting rid of the global variables through closures and >>>> devectorizing made the optimization _slower_ not faster in my case: >>>> https://gist.github.com/stichnoth/7f251ded83dcaa384273. I was >>>> surprised to see this as I expected a speed increase myself. >>>> >>>> >>>> Am Mittwoch, 21. Mai 2014 16:48:51 UTC+1 schrieb Miles Lubin: >>>>> >>>>> Just to extend on what John said, also think that if you can >>>>> restructure the code to devectorize it and avoid using global variables, >>>>> you'll see *much* better performance. >>>>> >>>>> The way to avoid globals is by using closures, for example: >>>>> function foo(x, data) >>>>> ... >>>>> end >>>>> >>>>> >>>>> ... >>>>> data_raw = readcsv(file) >>>>> data = reshape(data_raw, nObs, nChoices*(1+nVar), T) >>>>> >>>>> >>>>> >>>>> Optim.optimize(x-> foo(x,data), ...) >>>>> >>>>> >>>>> >>>>> On Tuesday, May 20, 2014 11:47:39 AM UTC-4, John Myles White wrote: >>>>>> >>>>>> Glad that you were able to figure out the source of your problems. >>>>>> >>>>>> It would be good to get a sense of the amount of time spent inside >>>>>> your objective function vs. the amount of time spent in the code for >>>>>> optimize(). In general, my experience is that >>90% of the compute time >>>>>> for >>>>>> an optimization problem is spent in the objective function itself. If >>>>>> you >>>>>> instrument your objective function to produce timing information on each >>>>>> call, that would help a lot since you could then get a sense of how much >>>>>> time is being spent in the code for optimize() after accounting for your >>>>>> function itself. >>>>>> >>>>>> It’s also worth keeping in mind that your use of implicit finite >>>>>> differencing means that your objective function is being called a lot >>>>>> more >>>>>> times than theoretically necessary, so that any minor performance issue >>>>>> in >>>>>> it will very substantially slow down the solver. >>>>>> >>>>>> Regarding you objective function’s code, I suspect that the >>>>>> combination of global variables and memory-allocating vectorized >>>>>> arithmetic >>>>>> means that your objective function might be a good bit slower in Julia >>>>>> than >>>>>> in Matlab. Matlab seems to be a little better about garbage collection >>>>>> for >>>>>> vectorized arithmetic and Julia is generally not able to optimize code >>>>>> involving global variables. >>>>>> >>>>>> Hope that points you in the right direction. >>>>>> >>>>>> — John >>>>>> >>>>>> On May 20, 2014, at 8:34 AM, Holger Stichnoth <[email protected]> >>>>>> wrote: >>>>>> >>>>>> Hi Andreas, >>>>>> hi John, >>>>>> hi Miles (via julia-opt, where I mistakenly also posted my question >>>>>> yesterday), >>>>>> >>>>>> Thanks for your help. Here is the link to the Gist I created: >>>>>> https://gist.github.com/anonymous/5f95ab1afd241c0a5962 >>>>>> >>>>>> In the process of producing a minimal (non-)working example, I >>>>>> discovered that the unexpected results are due to the truncation of the >>>>>> logit choice probabilities in the objective function. Optim.optimize() >>>>>> is >>>>>> sensitive to this when method = :l_bfgs is used. With method = >>>>>> :nelder_mead, everything works fine. When I comment out the truncation, >>>>>> :l_bfgs works as well. However, I need to increase the xtol from its >>>>>> default of 1e-12 to at least 1e-10, otherwise I get the error that the >>>>>> linesearch failed to converge. >>>>>> >>>>>> I guess I should just do without the truncation. The logit >>>>>> probabilities are between 0 and 1 by construction anyway. I had just >>>>>> copied >>>>>> the truncation code from a friend who had told me that probabilities >>>>>> that >>>>>> are too close to 0 or 1 sometimes cause numerical problems in his Matlab >>>>>> code of the same function. With Optim.optimize(), it seems to be the >>>>>> other >>>>>> way around, i.e. moving the probabilities further away from 0 or 1 (even >>>>>> by >>>>>> tiny amounts) means that the stability of the (gradient-based) algorithm >>>>>> is >>>>>> reduced. >>>>>> >>>>>> So for me, the problem is solved. The problem was not with Optim.jl, >>>>>> but with my own code. >>>>>> >>>>>> The only other thing that I discovered when trying out Julia and >>>>>> Optim.jl is that the optimization is currently considerably slower than >>>>>> Matlab's fminunc. From the Gist I provided above, are there any >>>>>> potential >>>>>> performance improvements that I am missing out on? >>>>>> >>>>>> Best wishes, >>>>>> Holger >>>>>> >>>>>> >>>>>> On Monday, 19 May 2014 14:51:16 UTC+1, John Myles White wrote: >>>>>>> >>>>>>> If you can, please do share an example of your code. Logit-style >>>>>>> models are in general numerically unstable, so it would be good to see >>>>>>> how >>>>>>> exactly you’ve coded things up. >>>>>>> >>>>>>> One thing you may be able to do is use automatic differentiation via >>>>>>> the autodiff = true keyword to optimize, but that assumes that your >>>>>>> objective function is written in completely pure Julia code (which >>>>>>> means, >>>>>>> for example, that your code must not call any of functions not written >>>>>>> in >>>>>>> Julia provided by Distributions.jl). >>>>>>> >>>>>>> — John >>>>>>> >>>>>>> On May 19, 2014, at 4:09 AM, Andreas Noack Jensen < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>> What is the output of versioninfo() and Pkg.installed("Optim")? >>>>>>> Also, would it be possible to make a gist with your code? >>>>>>> >>>>>>> >>>>>>> 2014-05-19 12:44 GMT+02:00 Holger Stichnoth <[email protected]>: >>>>>>> >>>>>>>> Hello, >>>>>>>> >>>>>>>> I installed Julia a couple of days ago and was impressed how easy >>>>>>>> it was to make the switch from Matlab and to parallelize my code >>>>>>>> (something I had never done before in any language; I'm an >>>>>>>> economist with only limited programming experience, mainly in Stata >>>>>>>> and >>>>>>>> Matlab). >>>>>>>> >>>>>>>> However, I ran into a problem when using Optim.jl for Maximum >>>>>>>> Likelihood estimation of a conditional logit model. With the default >>>>>>>> Nelder-Mead algorithm, optimize from the Optim.jl package gave me the >>>>>>>> same >>>>>>>> result that I had obtained in Stata and Matlab. >>>>>>>> >>>>>>>> With gradient-based methods such as BFGS, however, the algorithm >>>>>>>> jumped from the starting values to parameter values that are >>>>>>>> completely >>>>>>>> different. This happened for all thr starting values I tried, >>>>>>>> including the >>>>>>>> case in which I took a vector that is closed to the optimum from the >>>>>>>> Nelder-Mead algorithm. >>>>>>>> >>>>>>>> The problem seems to be that the algorithm tried values so large >>>>>>>> (in absolute value) that this caused problems for the objective >>>>>>>> function, where I call exponential functions into which these >>>>>>>> parameter values enter. As a result, the optimization based on the >>>>>>>> BFGS >>>>>>>> algorithm did not produce the expected optimum. >>>>>>>> >>>>>>>> While I could try to provide the analytical gradient in this simple >>>>>>>> case, I was planning to use Julia for Maximum Likelihood or Simulated >>>>>>>> Maximum Likelihood estimation in cases where the gradient is more >>>>>>>> difficult >>>>>>>> to derive, so it would be good if I could make the optimizer run also >>>>>>>> with >>>>>>>> numerical gradients. >>>>>>>> >>>>>>>> I suspect that my problems with optimize from Optim.jl could have >>>>>>>> something to do with the gradient() function. In the example below, >>>>>>>> for >>>>>>>> instance, I do not understand why the output of the gradient function >>>>>>>> includes values such as 11470.7, given that the function values differ >>>>>>>> only >>>>>>>> minimally. >>>>>>>> >>>>>>>> Best wishes, >>>>>>>> Holger >>>>>>>> >>>>>>>> >>>>>>>> julia> Optim.gradient(clogit_ll,zeros(4)) >>>>>>>> 60554544523933395e-22 >>>>>>>> 0Op >>>>>>>> 0 >>>>>>>> 0 >>>>>>>> >>>>>>>> 14923.564009972584 >>>>>>>> -60554544523933395e-22 >>>>>>>> 0 >>>>>>>> 0 >>>>>>>> 0 >>>>>>>> >>>>>>>> 14923.565228435104 >>>>>>>> 0 >>>>>>>> 60554544523933395e-22 >>>>>>>> 0 >>>>>>>> 0 >>>>>>>> >>>>>>>> 14923.569064311248 >>>>>>>> 0 >>>>>>>> -60554544523933395e-22 >>>>>>>> 0 >>>>>>>> 0 >>>>>>>> >>>>>>>> 14923.560174904109 >>>>>>>> 0 >>>>>>>> 0 >>>>>>>> 60554544523933395e-22 >>>>>>>> 0 >>>>>>>> >>>>>>>> 14923.63413848258 >>>>>>>> 0 >>>>>>>> 0 >>>>>>>> -60554544523933395e-22 >>>>>>>> 0 >>>>>>>> >>>>>>>> 14923.495218282553 >>>>>>>> 0 >>>>>>>> 0 >>>>>>>> 0 >>>>>>>> 60554544523933395e-22 >>>>>>>> >>>>>>>> 14923.58699717058 >>>>>>>> 0 >>>>>>>> 0 >>>>>>>> 0 >>>>>>>> -60554544523933395e-22 >>>>>>>> >>>>>>>> 14923.54224130672 >>>>>>>> 4-element Array{Float64,1}: >>>>>>>> -100.609 >>>>>>>> 734.0 >>>>>>>> 11470.7 >>>>>>>> 3695.5 >>>>>>>> >>>>>>>> function clogit_ll(beta::Vector) >>>>>>>> >>>>>>>> # Print the parameters and the return value to >>>>>>>> # check how gradient() and optimize() work. >>>>>>>> println(beta) >>>>>>>> println(-sum(compute_ll(beta,T,0))) >>>>>>>> >>>>>>>> # compute_ll computes the individual likelihood contributions >>>>>>>> # in the sample. T is the number of periods in the panel. The 0 >>>>>>>> # is not used in this simple example. In related functions, I >>>>>>>> # pass on different values here to estimate finite mixtures of >>>>>>>> # the conditional logit model. >>>>>>>> return -sum(compute_ll(beta,T,0)) >>>>>>>> end >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Med venlig hilsen >>>>>>> >>>>>>> Andreas Noack Jensen >>>>>>> >>>>>>> >>>>>>> >>>>>> >
