Re: [julia-users] Optim.jl: unexpected results when using gradient-based methods

Tim Holy Sun, 22 Jun 2014 11:46:24 -0700

I didn't look at ll2. But that one seems OK.

I didn't read the whole thread; are you timing just the execution of the 
objective function, or of the whole optimization? You can't easily interpret 
the latter.


--Tim

On Sunday, June 22, 2014 09:13:49 AM Thibaut Lamadon wrote:
> Hi Tim
> 
> is this a concern even-though I declare u1::Float64 = 0; at the beginning
> of the function, in ll2?
> 
> t.
> 
> On Sunday, 22 June 2014 15:57:53 UTC+1, Tim Holy wrote:
> > If x1, ..., x6 or coeff are Float64 arrays, then the initialization
> > 
> >     u1 = 0; u2 = 0; u3 = 0; u4 = 0; u5 = 0; u6 = 0
> > 
> > is problematic as soon as you get to
> > 
> >         for k=1:nVar
> >         
> >             u1 += x1[i + ni*( k-1 + nk* (t-1))]*coeff[k]
> >             u2 += x2[i + ni*( k-1 + nk* (t-1))]*coeff[k]
> >             u3 += x3[i + ni*( k-1 + nk* (t-1))]*coeff[k]
> >             u4 += x4[i + ni*( k-1 + nk* (t-1))]*coeff[k]
> >             u5 += x5[i + ni*( k-1 + nk* (t-1))]*coeff[k]
> >             u6 += x6[i + ni*( k-1 + nk* (t-1))]*coeff[k]
> >         
> >         end
> > 
> > because you're initializing them to be integers but then they get
> > converted
> > 
> > into Float64. A more careful approach is to do something like this:
> >     T = typeof(one(eltype(x1))*one(eltype(coeff))
> >     TT = typeof(one(T) + one(T))
> >     u1 = u2 = u3 = u4 = u5 = u6 = zero(TT)
> > 
> > In general, code_typed is your friend: look for Union types.
> > 
> >     T = Vector{Float64}
> >    
> >    code_typed(ll, (T, T, T, T, T, T, T, T, T, T, T, T, T, Float64, Int,
> > 
> > Int))
> > 
> > and you'll see Union types all over the place. (TypeCheck also, but it
> > didn't
> > seem to pick up this error.) And see the Performance Tips section of the
> > manual.
> > 
> > --Tim
> > 
> > On Sunday, June 22, 2014 04:50:16 AM Thibaut Lamadon wrote:
> > > Hi guys,
> > > 
> > > I wanted to look into this as well. The main issue I think is in the
> > 
> > speed
> > 
> > > of the objective function. Running @time on the objective function
> > > suggested a large amount of byte allocation. Checking the type revealed
> > > that getting x and y from data would set their types to Any.
> > > 
> > > So I convert the data to {Float64,3}, and then I changed to only store
> > > cumulative sum, not the vector of likelihood. I run the objective
> > 
> > function
> > 
> > > 50 times.
> > > 
> > > without the convert I get a total time of 9.18s
> > > with the convert and original function I get 4.15s
> > > with the convert and the new function I get 1.49s
> > > with matlab, I get 0.64s
> > > 
> > > so matlab still appears to be 2.5 times faster. But I am guessing matlab
> > 
> > is
> > 
> > > using SIMD instructions when computing matrix multiplications. So we
> > 
> > would
> > 
> > > need to try to use BLAS in julia with matrix multiplication to get a
> > 
> > very
> > 
> > > good comparison.
> > > 
> > > Anyway, fixing the type of the input, and just summing inside the loop
> > > gives a 6x speed up.
> > > 
> > > PS: Running the full optimization
> > > with the convert and the new function I get 4.8s
> > > with my matlab I get 4s
> > > 
> > > I could not commit to Holger gist, so I forked
> > > it: https://gist.github.com/tlamadon/58c47c115f8cf2388e89
> > > please check that I have not done anything stupid, but output seemed
> > > similar.
> > > 
> > > Holger, I hope you are having a good time at home (or in Paris?),  And a
> > > world cup note: Allez les bleus!
> > > 
> > > very best,
> > > 
> > > t.
> > > 
> > > On Tuesday, 27 May 2014 14:03:30 UTC+1, Holger Stichnoth wrote:
> > > > Hi John, hi Miles,
> > > > 
> > > > Thanks to both of you. I did not have time to look into this over the
> > > > weekend; I will do so in the next couple of days. I have already
> > 
> > uploaded
> > 
> > > > the Matlab files for comparison:
> > > > https://gist.github.com/stichnoth/7f251ded83dcaa384273
> > > > 
> > > > Holger
> > > > 
> > > > On Thursday, 22 May 2014 23:03:58 UTC+1, John Myles White wrote:
> > > >> Yeah, this case is tricky enough that we really need to get down to
> > 
> > the
> > 
> > > >> lowest details:
> > > >> 
> > > >> (1) Do Julia and Matlab perform similar numbers of function
> > 
> > evaluations?
> > 
> > > >> (2) If they don't perform similar numbers of function evaluations, is
> > 
> > one
> > 
> > > >> of them producing a better solution? Is the one that's producing a
> > 
> > better
> > 
> > > >> solution doing more function evaluations?
> > > >> 
> > > >> (3) If they're doing different numbers of function evaluations and
> > 
> > the
> > 
> > > >> one that does _fewer_ evaluations also produces a better solution,
> > 
> > what's
> > 
> > > >> the reason? For example, is our line search default less effective
> > 
> > for
> > 
> > > >> this
> > > >> problem than the Matlab line search? If you try other line search
> > > >> algorithms, do the results stay the same?
> > > >> 
> > > >> Unfortunately, answering all of these reliably make take us all
> > 
> > pretty
> > 
> > > >> far down the rabbit hole. But they're worth pushing on
> > 
> > systematically.
> > 
> > > >>  -- John
> > > >> 
> > > >> On May 22, 2014, at 2:59 PM, Miles Lubin <[email protected]> wrote:
> > > >> 
> > > >> I can get another 50% speedup by:
> > > >> 
> > > >> - Running the optimization twice and timing the second run only, this
> > 
> > is
> > 
> > > >> the more appropriate way to benchmark julia because it excludes the
> > > >> function compilation time
> > > >> - Setting autodiff=true
> > > >> - Breaking up the long chains of sums, apparently these seem to be
> > 
> > slow
> > 
> > > >> At this point one really needs to compare the number of function
> > > >> evaluations in each method, as John suggested.
> > > >> 
> > > >> On Thursday, May 22, 2014 9:53:36 AM UTC-4, Holger Stichnoth wrote:
> > > >>> Thanks, it's faster now (by roughly a factor of 3 on my computer),
> > 
> > but
> > 
> > > >>> still considerably slower than fminunc:
> > > >>> 
> > > >>> Averages over 20 runs:
> > > >>> Julia/Optim.optimize: 10.5s
> > > >>> Matlab/fminunc: 2.6s
> > > >>> 
> > > >>> Here are my Matlab settings:
> > > >>> options = optimset('Display', 'iter', ...
> > > >>> 
> > > >>>      'MaxIter', 2500, 'MaxFunEvals', 500000, ...
> > > >>>      'TolFun', 1e-6, 'TolX', 1e-6, ...
> > > >>>      'GradObj', 'off', 'DerivativeCheck', 'off');
> > > >>> 
> > > >>> startb      = ones(1,nVar)';
> > > >>> [estim_clo, ll_clo]= ...
> > > >>> 
> > > >>>      fminunc(@(param)clogit_ll(param,data), ...
> > > >>>      startb,options);
> > > >>> 
> > > >>> Could the speed issue be related to the following messages that I
> > 
> > get
> > 
> > > >>> when I run the Julia code?
> > > >>> C:\Users\User\Documents\References\Software\Julia\mlubin>julia
> > 
> > main.jl
> > 
> > > >>> Warning: could not import Base.foldl into NumericExtensions
> > > >>> Warning: could not import Base.foldr into NumericExtensions
> > > >>> Warning: could not import Base.sum! into NumericExtensions
> > > >>> Warning: could not import Base.maximum! into NumericExtensions
> > > >>> Warning: could not import Base.minimum! into NumericExtensions
> > > >>> 
> > > >>> Am Donnerstag, 22. Mai 2014 14:18:36 UTC+1 schrieb Miles Lubin:
> > > >>>> I was able to get a nearly 5x speedup by avoiding the matrix
> > 
> > allocation
> > 
> > > >>>> and making the accumulators type stable:
> > > >>>> https://gist.github.com/mlubin/055690ddf2466e98bba6
> > > >>>> 
> > > >>>> How does this compare with Matlab now?
> > > >>>> 
> > > >>>> On Thursday, May 22, 2014 6:38:44 AM UTC-4, Holger Stichnoth wrote:
> > > >>>>> @ John: You are right, when I specify the function as
> > > >>>>> clogit_ll(beta::Vector) instead of
> > 
> > clogit_ll(beta::Vector{Float64}),
> > 
> > > >>>>> autodiff = true works fine. Thanks for your help!
> > > >>>>> 
> > > >>>>> @ Tim: I have set the rather strict default convergence criteria
> > 
> > of
> > 
> > > >>>>> Optim.optimize to Matlab's default values for fminunc, but the
> > 
> > speed
> > 
> > > >>>>> difference is still there.
> > > >>>>> 
> > > >>>>> @ Miles/John: Getting rid of the global variables through closures
> > 
> > and
> > 
> > > >>>>> devectorizing made the optimization _slower_ not faster in my
> > 
> > case:
> > > >>>>> https://gist.github.com/stichnoth/7f251ded83dcaa384273. I was
> > > >>>>> surprised to see this as I expected a speed increase myself.
> > > >>>>> 
> > > >>>>> Am Mittwoch, 21. Mai 2014 16:48:51 UTC+1 schrieb Miles Lubin:
> > > >>>>>> Just to extend on what John said, also think that if you can
> > > >>>>>> restructure the code to devectorize it and avoid using global
> > > >>>>>> variables,
> > > >>>>>> you'll see *much* better performance.
> > > >>>>>> 
> > > >>>>>> The way to avoid globals is by using closures, for example:
> > > >>>>>> function foo(x, data)
> > > >>>>>> 
> > > >>>>>>     ...
> > > >>>>>> 
> > > >>>>>> end
> > > >>>>>> 
> > > >>>>>> 
> > > >>>>>> ...
> > > >>>>>> data_raw = readcsv(file)
> > > >>>>>> data = reshape(data_raw, nObs, nChoices*(1+nVar), T)
> > > >>>>>> 
> > > >>>>>> 
> > > >>>>>> 
> > > >>>>>> Optim.optimize(x-> foo(x,data), ...)
> > > >>>>>> 
> > > >>>>>> On Tuesday, May 20, 2014 11:47:39 AM UTC-4, John Myles White
> > 
> > wrote:
> > > >>>>>>> Glad that you were able to figure out the source of your
> > 
> > problems.
> > 
> > > >>>>>>> It would be good to get a sense of the amount of time spent
> > 
> > inside
> > 
> > > >>>>>>> your objective function vs. the amount of time spent in the code
> > 
> > for
> > 
> > > >>>>>>> optimize(). In general, my experience is that >>90% of the
> > 
> > compute
> > 
> > > >>>>>>> time for
> > > >>>>>>> an optimization problem is spent in the objective function
> > 
> > itself.
> > 
> > > >>>>>>> If you
> > > >>>>>>> instrument your objective function to produce timing information
> > 
> > on
> > 
> > > >>>>>>> each
> > > >>>>>>> call, that would help a lot since you could then get a sense of
> > 
> > how
> > 
> > > >>>>>>> much
> > > >>>>>>> time is being spent in the code for optimize() after accounting
> > 
> > for
> > 
> > > >>>>>>> your
> > > >>>>>>> function itself.
> > > >>>>>>> 
> > > >>>>>>> It’s also worth keeping in mind that your use of implicit finite
> > > >>>>>>> differencing means that your objective function is being called
> > 
> > a
> > 
> > > >>>>>>> lot more
> > > >>>>>>> times than theoretically necessary, so that any minor
> > 
> > performance
> > 
> > > >>>>>>> issue in
> > > >>>>>>> it will very substantially slow down the solver.
> > > >>>>>>> 
> > > >>>>>>> Regarding you objective function’s code, I suspect that the
> > > >>>>>>> combination of global variables and memory-allocating vectorized
> > > >>>>>>> arithmetic
> > > >>>>>>> means that your objective function might be a good bit slower in
> > > >>>>>>> Julia than
> > > >>>>>>> in Matlab. Matlab seems to be a little better about garbage
> > > >>>>>>> collection for
> > > >>>>>>> vectorized arithmetic and Julia is generally not able to
> > 
> > optimize
> > 
> > > >>>>>>> code
> > > >>>>>>> involving global variables.
> > > >>>>>>> 
> > > >>>>>>> Hope that points you in the right direction.
> > > >>>>>>> 
> > > >>>>>>>  — John
> > > >>>>>>> 
> > > >>>>>>> On May 20, 2014, at 8:34 AM, Holger Stichnoth
> > > >>>>>>> <[email protected]>
> > > >>>>>>> 
> > > >>>>>>> wrote:
> > > >>>>>>> 
> > > >>>>>>> Hi Andreas,
> > > >>>>>>> hi John,
> > > >>>>>>> hi Miles (via julia-opt, where I mistakenly also posted my
> > 
> > question
> > 
> > > >>>>>>> yesterday),
> > > >>>>>>> 
> > > >>>>>>> Thanks for your help. Here is the link to the Gist I created:
> > > >>>>>>> https://gist.github.com/anonymous/5f95ab1afd241c0a5962
> > > >>>>>>> 
> > > >>>>>>> In the process of producing a minimal (non-)working example, I
> > > >>>>>>> discovered that the unexpected results are due to the truncation
> > 
> > of
> > 
> > > >>>>>>> the
> > > >>>>>>> logit choice probabilities in the objective function.
> > > >>>>>>> Optim.optimize() is
> > > >>>>>>> sensitive to this when method = :l_bfgs is used. With method =
> > > >>>>>>> 
> > > >>>>>>> :nelder_mead, everything works fine. When I comment out the
> > > >>>>>>> :truncation,
> > > >>>>>>> :l_bfgs works as well. However, I need to increase the xtol from
> > 
> > its
> > 
> > > >>>>>>> default of 1e-12 to at least 1e-10, otherwise I get the error
> > 
> > that
> > 
> > > >>>>>>> the
> > > >>>>>>> linesearch failed to converge.
> > > >>>>>>> 
> > > >>>>>>> I guess I should just do without the truncation. The logit
> > > >>>>>>> probabilities are between 0 and 1 by construction anyway. I had
> > 
> > just
> > 
> > > >>>>>>> copied
> > > >>>>>>> the truncation code from a friend who had told me that
> > 
> > probabilities
> > 
> > > >>>>>>> that
> > > >>>>>>> are too close to 0 or 1 sometimes cause numerical problems in
> > 
> > his
> > 
> > > >>>>>>> Matlab
> > > >>>>>>> code of the same function. With Optim.optimize(), it seems to be
> > 
> > the
> > 
> > > >>>>>>> other
> > > >>>>>>> way around, i.e. moving the probabilities further away from 0 or
> > 
> > 1
> > 
> > > >>>>>>> (even by
> > > >>>>>>> tiny amounts) means that the stability of the (gradient-based)
> > > >>>>>>> algorithm is
> > > >>>>>>> reduced.
> > > >>>>>>> 
> > > >>>>>>> So for me, the problem is solved. The problem was not with
> > 
> > Optim.jl,
> > 
> > > >>>>>>> but with my own code.
> > > >>>>>>> 
> > > >>>>>>> The only other thing that I discovered when trying out Julia and
> > > >>>>>>> Optim.jl is that the optimization is currently considerably
> > 
> > slower
> > 
> > > >>>>>>> than
> > > >>>>>>> Matlab's fminunc. From the Gist I provided above, are there any
> > > >>>>>>> potential
> > > >>>>>>> performance improvements that I am missing out on?
> > > >>>>>>> 
> > > >>>>>>> Best wishes,
> > > >>>>>>> Holger
> > > >>>>>>> 
> > > >>>>>>> On Monday, 19 May 2014 14:51:16 UTC+1, John Myles White wrote:
> > > >>>>>>>> If you can, please do share an example of your code.
> > 
> > Logit-style
> > 
> > > >>>>>>>> models are in general numerically unstable, so it would be good
> > 
> > to
> > 
> > > >>>>>>>> see how
> > > >>>>>>>> exactly you’ve coded things up.
> > > >>>>>>>> 
> > > >>>>>>>> One thing you may be able to do is use automatic
> > 
> > differentiation
> > 
> > > >>>>>>>> via the autodiff = true keyword to optimize, but that assumes
> > 
> > that
> > 
> > > >>>>>>>> your
> > > >>>>>>>> objective function is written in completely pure Julia code
> > 
> > (which
> > 
> > > >>>>>>>> means,
> > > >>>>>>>> for example, that your code must not call any of functions not
> > > >>>>>>>> written in
> > > >>>>>>>> Julia provided by Distributions.jl).
> > > >>>>>>>> 
> > > >>>>>>>>  — John
> > > >>>>>>>> 
> > > >>>>>>>> On May 19, 2014, at 4:09 AM, Andreas Noack Jensen <
> > > >>>>>>>> [email protected]> wrote:
> > > >>>>>>>> 
> > > >>>>>>>> What is the output of versioninfo() and Pkg.installed("Optim")?
> > > >>>>>>>> Also, would it be possible to make a gist with your code?
> > > >>>>>>>> 
> > > >>>>>>>> 2014-05-19 12:44 GMT+02:00 Holger Stichnoth 
<[email protected]>:
> > > >>>>>>>>>  Hello,
> > > >>>>>>>>> 
> > > >>>>>>>>> I installed Julia a couple of days ago and was impressed how
> > 
> > easy
> > 
> > > >>>>>>>>> it was to make the switch from Matlab and to parallelize my
> > 
> > code
> > 
> > > >>>>>>>>> (something I had never done before in any language; I'm an
> > > >>>>>>>>> economist with only limited programming experience, mainly in
> > > >>>>>>>>> Stata and
> > > >>>>>>>>> Matlab).
> > > >>>>>>>>> 
> > > >>>>>>>>> However, I ran into a problem when using Optim.jl for Maximum
> > > >>>>>>>>> Likelihood estimation of a conditional logit model. With the
> > > >>>>>>>>> default
> > > >>>>>>>>> Nelder-Mead algorithm, optimize from the Optim.jl package gave
> > 
> > me
> > 
> > > >>>>>>>>> the same
> > > >>>>>>>>> result that I had obtained in Stata and Matlab.
> > > >>>>>>>>> 
> > > >>>>>>>>> With gradient-based methods such as BFGS, however, the
> > 
> > algorithm
> > 
> > > >>>>>>>>> jumped from the starting values to parameter values that are
> > > >>>>>>>>> completely
> > > >>>>>>>>> different. This happened for all thr starting values I tried,
> > > >>>>>>>>> including the
> > > >>>>>>>>> case in which I took a vector that is closed to the optimum
> > 
> > from
> > 
> > > >>>>>>>>> the
> > > >>>>>>>>> Nelder-Mead algorithm.
> > > >>>>>>>>> 
> > > >>>>>>>>> The problem seems to be that the algorithm tried values so
> > 
> > large
> > 
> > > >>>>>>>>> (in absolute value) that this caused problems for the
> > 
> > objective
> > 
> > > >>>>>>>>> function, where I call exponential functions into which these
> > > >>>>>>>>> parameter values enter. As a result, the optimization based on
> > 
> > the
> > 
> > > >>>>>>>>> BFGS
> > > >>>>>>>>> algorithm did not produce the expected optimum.
> > > >>>>>>>>> 
> > > >>>>>>>>> While I could try to provide the analytical gradient in this
> > > >>>>>>>>> simple case, I was planning to use Julia for Maximum
> > 
> > Likelihood or
> > 
> > > >>>>>>>>> Simulated Maximum Likelihood estimation in cases where the
> > > >>>>>>>>> gradient is more
> > > >>>>>>>>> difficult to derive, so it would be good if I could make the
> > > >>>>>>>>> optimizer run
> > > >>>>>>>>> also with numerical gradients.
> > > >>>>>>>>> 
> > > >>>>>>>>> I suspect that my problems with optimize from Optim.jl could
> > 
> > have
> > 
> > > >>>>>>>>> something to do with the gradient() function. In the example
> > > >>>>>>>>> below, for
> > > >>>>>>>>> instance, I do not understand why the output of the gradient
> > > >>>>>>>>> function
> > > >>>>>>>>> includes values such as 11470.7, given that the function
> > 
> > values
> > 
> > > >>>>>>>>> differ only
> > > >>>>>>>>> minimally.
> > > >>>>>>>>> 
> > > >>>>>>>>> Best wishes,
> > > >>>>>>>>> Holger
> > > >>>>>>>>> 
> > > >>>>>>>>> 
> > > >>>>>>>>> julia> Optim.gradient(clogit_ll,zeros(4))
> > > >>>>>>>>> 60554544523933395e-22
> > > >>>>>>>>> 0Op
> > > >>>>>>>>> 0
> > > >>>>>>>>> 0
> > > >>>>>>>>> 
> > > >>>>>>>>> 14923.564009972584
> > > >>>>>>>>> -60554544523933395e-22
> > > >>>>>>>>> 0
> > > >>>>>>>>> 0
> > > >>>>>>>>> 0
> > > >>>>>>>>> 
> > > >>>>>>>>> 14923.565228435104
> > > >>>>>>>>> 0
> > > >>>>>>>>> 60554544523933395e-22
> > > >>>>>>>>> 0
> > > >>>>>>>>> 0
> > > >>>>>>>>> 
> > > >>>>>>>>> 14923.569064311248
> > > >>>>>>>>> 0
> > > >>>>>>>>> -60554544523933395e-22
> > > >>>>>>>>> 0
> > > >>>>>>>>> 0
> > > >>>>>>>>> 
> > > >>>>>>>>> 14923.560174904109
> > > >>>>>>>>> 0
> > > >>>>>>>>> 0
> > > >>>>>>>>> 60554544523933395e-22
> > > >>>>>>>>> 0
> > > >>>>>>>>> 
> > > >>>>>>>>> 14923.63413848258
> > > >>>>>>>>> 0
> > > >>>>>>function ll2(coeff, x1, x2, x3, x4, x5, x6,
        y1, y2, y3, y4, y5, y6,
        T, nObs, nVar)
 
    llit::Float64 = 0; #zeros(nObs,T)
    ni = size(x1,1)
    nk = size(x1,2)
 
    denom::Float64 =0;
    u1::Float64 = 0; 
    u2::Float64 = 0; 
    u3::Float64 = 0; 
    u4::Float64 = 0; 
    u5::Float64 = 0; 
    u6::Float64 = 0;
    p1::Float64 = 0; 
    p2::Float64 = 0; 
    p3::Float64 = 0; 
    p4::Float64 = 0; 
    p5::Float64 = 0; 
    p6::Float64 = 0;
 
    for i=1:nObs
        for t=1:T
        u1 = 0; u2 = 0; u3 = 0; u4 = 0; u5 = 0; u6 = 0;
 
        for k=1:nVar
            jj = i + ni*( k-1 + nk* (t-1))
            u1 = u1 + x1[jj]*coeff[k]
            u2 = u2 + x2[jj]*coeff[k]
            u3 = u3 + x3[jj]*coeff[k]
            u4 = u4 + x4[jj]*coeff[k]
            u5 = u5 + x5[jj]*coeff[k]
            u6 = u6 + x6[jj]*coeff[k]
        end
 
        denom = exp(u1) +
                exp(u2) + 
                exp(u3) +
                exp(u4) +
                exp(u5) +
                exp(u6)
 
        p1 = exp(u1)/denom
        p2 = exp(u2)/denom
        p3 = exp(u3)/denom
        p4 = exp(u4)/denom
        p5 = exp(u5)/denom
        p6 = exp(u6)/denom
 
        llit += 
            (log(p1))*y1[i,t] +
            (log(p2))*y2[i,t] +
            (log(p3))*y3[i,t] +
            (log(p4))*y4[i,t] +
            (log(p5))*y5[i,t] +
            (log(p6))*y6[i,t]
 
       end
    end
    return -llit
end>>> 0
> > > >>>>>>>>> -60554544523933395e-22
> > > >>>>>>>>> 0
> > > >>>>>>>>> 
> > > >>>>>>>>> 14923.495218282553
> > > >>>>>>>>> 0
> > > >>>>>>>>> 0
> > > >>>>>>>>> 0
> > > >>>>>>>>> 60554544523933395e-22
> > > >>>>>>>>> 
> > > >>>>>>>>> 14923.58699717058
> > > >>>>>>>>> 0
> > > >>>>>>>>> 0
> > > >>>>>>>>> 0
> > > >>>>>>>>> -60554544523933395e-22
> > > >>>>>>>>> 
> > > >>>>>>>>> 14923.54224130672
> > > >>>>>>>>> 
> > > >>>>>>>>> 4-element Array{Float64,1}:
> > > >>>>>>>>>   -100.609
> > > >>>>>>>>>   
> > > >>>>>>>>>    734.0
> > > >>>>>>>>>  
> > > >>>>>>>>>  11470.7
> > > >>>>>>>>>  
> > > >>>>>>>>>   3695.5
> > > >>>>>>>>> 
> > > >>>>>>>>> function clogit_ll(beta::Vector)
> > > >>>>>>>>> 
> > > >>>>>>>>>     # Print the parameters and the return value to
> > > >>>>>>>>>     # check how gradient() and optimize() work.
> > > >>>>>>>>>     println(beta)
> > > >>>>>>>>>     println(-sum(compute_ll(beta,T,0)))
> > > >>>>>>>>>     
> > > >>>>>>>>>     # compute_ll computes the individual likelihood
> > 
> > contributions
> > 
> > > >>>>>>>>>     # in the sample. T is the number of periods in the panel.
> > 
> > The
> > 
> > > >>>>>>>>> 0
> > > >>>>>>>>> 
> > > >>>>>>>>>     # is not used in this simple example. In related
> > 
> > functions, I
> > 
> > > >>>>>>>>>     # pass on different values here to estimate finite
> > 
> > mixtures of
> > 
> > > >>>>>>>>>     # the conditional logit model.
> > > >>>>>>>>>     return -sum(compute_ll(beta,T,0))
> > > >>>>>>>>> 
> > > >>>>>>>>> end
> > > >>>>>>>> 
> > > >>>>>>>> Andreas Noack Jensen

Re: [julia-users] Optim.jl: unexpected results when using gradient-based methods

Reply via email to