How many function evaluations is Matlab performing and how many is Julia performing?
— John On May 22, 2014, at 6:53 AM, Holger Stichnoth <[email protected]> wrote: > Thanks, it's faster now (by roughly a factor of 3 on my computer), but still > considerably slower than fminunc: > > Averages over 20 runs: > Julia/Optim.optimize: 10.5s > Matlab/fminunc: 2.6s > > Here are my Matlab settings: > options = optimset('Display', 'iter', ... > 'MaxIter', 2500, 'MaxFunEvals', 500000, ... > 'TolFun', 1e-6, 'TolX', 1e-6, ... > 'GradObj', 'off', 'DerivativeCheck', 'off'); > > startb = ones(1,nVar)'; > [estim_clo, ll_clo]= ... > fminunc(@(param)clogit_ll(param,data), ... > startb,options); > > Could the speed issue be related to the following messages that I get when I > run the Julia code? > C:\Users\User\Documents\References\Software\Julia\mlubin>julia main.jl > Warning: could not import Base.foldl into NumericExtensions > Warning: could not import Base.foldr into NumericExtensions > Warning: could not import Base.sum! into NumericExtensions > Warning: could not import Base.maximum! into NumericExtensions > Warning: could not import Base.minimum! into NumericExtensions > > > > > Am Donnerstag, 22. Mai 2014 14:18:36 UTC+1 schrieb Miles Lubin: > I was able to get a nearly 5x speedup by avoiding the matrix allocation and > making the accumulators type stable: > https://gist.github.com/mlubin/055690ddf2466e98bba6 > > How does this compare with Matlab now? > > On Thursday, May 22, 2014 6:38:44 AM UTC-4, Holger Stichnoth wrote: > @ John: You are right, when I specify the function as clogit_ll(beta::Vector) > instead of clogit_ll(beta::Vector{Float64}), autodiff = true works fine. > Thanks for your help! > > @ Tim: I have set the rather strict default convergence criteria of > Optim.optimize to Matlab's default values for fminunc, but the speed > difference is still there. > > @ Miles/John: Getting rid of the global variables through closures and > devectorizing made the optimization _slower_ not faster in my case: > https://gist.github.com/stichnoth/7f251ded83dcaa384273. I was surprised to > see this as I expected a speed increase myself. > > > Am Mittwoch, 21. Mai 2014 16:48:51 UTC+1 schrieb Miles Lubin: > Just to extend on what John said, also think that if you can restructure the > code to devectorize it and avoid using global variables, you'll see *much* > better performance. > > The way to avoid globals is by using closures, for example: > function foo(x, data) > ... > end > > > ... > data_raw = readcsv(file) > data = reshape(data_raw, nObs, nChoices*(1+nVar), T) > > > > Optim.optimize(x-> foo(x,data), ...) > > > > On Tuesday, May 20, 2014 11:47:39 AM UTC-4, John Myles White wrote: > Glad that you were able to figure out the source of your problems. > > It would be good to get a sense of the amount of time spent inside your > objective function vs. the amount of time spent in the code for optimize(). > In general, my experience is that >>90% of the compute time for an > optimization problem is spent in the objective function itself. If you > instrument your objective function to produce timing information on each > call, that would help a lot since you could then get a sense of how much time > is being spent in the code for optimize() after accounting for your function > itself. > > It’s also worth keeping in mind that your use of implicit finite differencing > means that your objective function is being called a lot more times than > theoretically necessary, so that any minor performance issue in it will very > substantially slow down the solver. > > Regarding you objective function’s code, I suspect that the combination of > global variables and memory-allocating vectorized arithmetic means that your > objective function might be a good bit slower in Julia than in Matlab. Matlab > seems to be a little better about garbage collection for vectorized > arithmetic and Julia is generally not able to optimize code involving global > variables. > > Hope that points you in the right direction. > > — John > > On May 20, 2014, at 8:34 AM, Holger Stichnoth <[email protected]> wrote: > >> Hi Andreas, >> hi John, >> hi Miles (via julia-opt, where I mistakenly also posted my question >> yesterday), >> >> Thanks for your help. Here is the link to the Gist I created: >> https://gist.github.com/anonymous/5f95ab1afd241c0a5962 >> >> In the process of producing a minimal (non-)working example, I discovered >> that the unexpected results are due to the truncation of the logit choice >> probabilities in the objective function. Optim.optimize() is sensitive to >> this when method = :l_bfgs is used. With method = :nelder_mead, everything >> works fine. When I comment out the truncation, :l_bfgs works as well. >> However, I need to increase the xtol from its default of 1e-12 to at least >> 1e-10, otherwise I get the error that the linesearch failed to converge. >> >> I guess I should just do without the truncation. The logit probabilities are >> between 0 and 1 by construction anyway. I had just copied the truncation >> code from a friend who had told me that probabilities that are too close to >> 0 or 1 sometimes cause numerical problems in his Matlab code of the same >> function. With Optim.optimize(), it seems to be the other way around, i.e. >> moving the probabilities further away from 0 or 1 (even by tiny amounts) >> means that the stability of the (gradient-based) algorithm is reduced. >> >> So for me, the problem is solved. The problem was not with Optim.jl, but >> with my own code. >> >> The only other thing that I discovered when trying out Julia and Optim.jl is >> that the optimization is currently considerably slower than Matlab's >> fminunc. From the Gist I provided above, are there any potential performance >> improvements that I am missing out on? >> >> Best wishes, >> Holger >> >> >> On Monday, 19 May 2014 14:51:16 UTC+1, John Myles White wrote: >> If you can, please do share an example of your code. Logit-style models are >> in general numerically unstable, so it would be good to see how exactly >> you’ve coded things up. >> >> One thing you may be able to do is use automatic differentiation via the >> autodiff = true keyword to optimize, but that assumes that your objective >> function is written in completely pure Julia code (which means, for example, >> that your code must not call any of functions not written in Julia provided >> by Distributions.jl). >> >> — John >> >> On May 19, 2014, at 4:09 AM, Andreas Noack Jensen <[email protected]> >> wrote: >> >>> What is the output of versioninfo() and Pkg.installed("Optim")? Also, would >>> it be possible to make a gist with your code? >>> >>> >>> 2014-05-19 12:44 GMT+02:00 Holger Stichnoth <[email protected]>: >>> Hello, >>> >>> I installed Julia a couple of days ago and was impressed how easy it was to >>> make the switch from Matlab and to parallelize my code >>> (something I had never done before in any language; I'm an economist with >>> only limited programming experience, mainly in Stata and Matlab). >>> >>> However, I ran into a problem when using Optim.jl for Maximum Likelihood >>> estimation of a conditional logit model. With the default Nelder-Mead >>> algorithm, optimize from the Optim.jl package gave me the same result that >>> I had obtained in Stata and Matlab. >>> >>> With gradient-based methods such as BFGS, however, the algorithm jumped >>> from the starting values to parameter values that are completely different. >>> This happened for all thr starting values I tried, including the case in >>> which I took a vector that is closed to the optimum from the Nelder-Mead >>> algorithm. >>> >>> The problem seems to be that the algorithm tried values so large (in >>> absolute value) that this caused problems for the objective >>> function, where I call exponential functions into which these parameter >>> values enter. As a result, the optimization based on the BFGS algorithm did >>> not produce the expected optimum. >>> >>> While I could try to provide the analytical gradient in this simple case, I >>> was planning to use Julia for Maximum Likelihood or Simulated Maximum >>> Likelihood estimation in cases where the gradient is more difficult to >>> derive, so it would be good if I could make the optimizer run also with >>> numerical gradients. >>> >>> I suspect that my problems with optimize from Optim.jl could have something >>> to do with the gradient() function. In the example below, for instance, I >>> do not understand why the output of the gradient function includes values >>> such as 11470.7, given that the function values differ only minimally. >>> >>> Best wishes, >>> Holger >>> >>> >>> julia> Optim.gradient(clogit_ll,zeros(4)) >>> 60554544523933395e-22 >>> 0Op >>> 0 >>> 0 >>> >>> 14923.564009972584 >>> -60554544523933395e-22 >>> 0 >>> 0 >>> 0 >>> >>> 14923.565228435104 >>> 0 >>> 60554544523933395e-22 >>> 0 >>> 0 >>> >>> 14923.569064311248 >>> 0 >>> -60554544523933395e-22 >>> 0 >>> 0 >>> >>> 14923.560174904109 >>> 0 >>> 0 >>> 60554544523933395e-22 >>> 0 >>> >>> 14923.63413848258 >>> 0 >>> 0 >>> -60554544523933395e-22 >>> 0 >>> >>> 14923.495218282553 >>> 0 >>> 0 >>> 0 >>> 60554544523933395e-22 >>> >>> 14923.58699717058 >>> 0 >>> 0 >>> 0 >>> -60554544523933395e-22 >>> >>> 14923.54224130672 >>> 4-element Array{Float64,1}: >>> -100.609 >>> 734.0 >>> 11470.7 >>> 3695.5 >>> >>> function clogit_ll(beta::Vector) >>> >>> # Print the parameters and the return value to >>> # check how gradient() and optimize() work. >>> println(beta) >>> println(-sum(compute_ll(beta,T,0))) >>> >>> # compute_ll computes the individual likelihood contributions >>> # in the sample. T is the number of periods in the panel. The 0 >>> # is not used in this simple example. In related functions, I >>> # pass on different values here to estimate finite mixtures of >>> # the conditional logit model. >>> return -sum(compute_ll(beta,T,0)) >>> end >>> >>> >>> >>> >>> >>> >>> >>> >>> -- >>> Med venlig hilsen >>> >>> Andreas Noack Jensen >> >
