Here's my two, not very thorough, cents: (1) The odds of a bug in Optim.jl are very high (>90%). (2) The odds of a bug in your code are very high (>90%).
It's pretty easy to make a decision about (2). Deciding on (1) is a lot harder, since you need a specific optimization that Optim should solve, but fails to solve. For resolving (2), you have a couple of sub-problems: (a) Is your gradient analytically correct? You can check this by comparing it with finite differencing. If it doesn't produce a close match, be suspicious. (b) Is your log likelihood + gradient numerically correct? Your stress test is, in theory, an attempt to test this. But numerical instability implies that the problem only occurs when the problem is likely to be numerically unstable. So you'd want to measure the correlation between the difficulty of the problem and the probability of failure. My experience is that the Optim error messages don't make it easy to realize when you've made a mistake in your gradients. This is being worked on at the moment, but I think someone would need to dedicate a week to working on this to get us to a point where the error messages are always clear. -- John On Sunday, February 15, 2015 at 3:29:35 PM UTC-8, Ryan Carey wrote: > > Hi all, > > I've just discovered Julia this last month, and have been greatly enjoying > using it, especially because of its matlab-like linear algebra notation and > all-round concise and intuitive syntax. > > I've been playing with its optimisation functions, looking to implement > gradient descent for logistic regression but I hit a couple of stumbling > blocks, and was wondering how you've managed these. > > Using Optim, I implemented regularized logistic regression with l_bfgs, > and although it worked some times, when I stress-tested it with some k-fold > validation, I got Linesearch errors. > > I've got a dataset that's about 600 x 100 (m x n) with weights w and > classes y. > > my code: > function J(w) > m,n = size(X) > return sum(-y'*log(logistic(X*w)) - (1-y')*log(1-logistic(X*w))) + > reg/(2m) * sum(w.^2) # note normalizing bias weight > end > function g!(w,storage) > storage[:] = X' * (logistic(X*w) - y) + reg / m * w > end > > out = optimize(J, g!, w, method = :l_bfgs,show_trace=true) > > > the error: > > Iter Function value Gradient norm > ... > 19 -9.034225e+02 2.092807e+02 > 20 -9.034225e+02 2.092807e+02 > 21 -9.034225e+02 2.092807e+02 > 22 -9.034225e+02 2.092807e+02 > 23 -9.034225e+02 2.092807e+02 > > Linesearch failed to converge > while loading In[6], in expression starting on line 2 > > in hz_linesearch! at > /home/ryan/.julia/v0.3/Optim/src/linesearch/hz_linesearch.jl:374 > in hz_linesearch! at > /home/ryan/.julia/v0.3/Optim/src/linesearch/hz_linesearch.jl:188 > in l_bfgs at /home/ryan/.julia/v0.3/Optim/src/l_bfgs.jl:165 > in optimize at /home/ryan/.julia/v0.3/Optim/src/optimize.jl:340 > > > Perhaps I should override its convergence criteria? Or there's a bug in my > code? Anyway, I thought I might have more like with conjugate gradient > descent, so I included types.jl and cg.jl from the Optim package, and tried > to make it work too, defining a Differentiable Function type > > > function rosenbrock(g, x::Vector) > > d1 = 1.0 - x[1] > > d2 = x[2] - x[1]^2 > > if !(g === nothing) > > g[1] = -2.0*d1 - 400.0*d2*x[1] > > g[2] = 200.0*d2 > > end > > val = d1^2 + 100.0 * d2^2 > > return val > > end > > > function rosenbrock_gradient!(x::Vector, storage::Vector) > > storage[1] = -2.0 * (1.0 - x[1]) - 400.0 * (x[2] - x[1]^2) * x[1] > > storage[2] = 200.0 * (x[2] - x[1]^2) > > end > > > > cg(rosenbrock,[0,0]) > > > d2 = DifferentiableFunction(rosenbrock,rosenbrock_gradient!) > > cg(d2,[0,0]) > > ERROR: InexactError() > > > I tried a few variations on the function 'cg' before coming here for help. > I notice that there are a couple of other optimization packages out there > but this one is by JMW and looks good. > > Obviously, if I just wanted to perform linear regression, I could just use > a built-in function, but to use more complex models, I would need to be > able to do gradient descent. > > How have others fared with Optim? Any thoughts on what's going wrong? > General tips for how to make gradient descent work with Julia? > > > >
