Hi all,
I've just discovered Julia this last month, and have been greatly enjoying
using it, especially because of its matlab-like linear algebra notation and
all-round concise and intuitive syntax.
I've been playing with its optimisation functions, looking to implement
gradient descent for logistic regression but I hit a couple of stumbling
blocks, and was wondering how you've managed these.
Using Optim, I implemented regularized logistic regression with l_bfgs, and
although it worked some times, when I stress-tested it with some k-fold
validation, I got Linesearch errors.
I've got a dataset that's about 600 x 100 (m x n) with weights w and
classes y.
my code:
function J(w)
m,n = size(X)
return sum(-y'*log(logistic(X*w)) - (1-y')*log(1-logistic(X*w))) +
reg/(2m) * sum(w.^2) # note normalizing bias weight
end
function g!(w,storage)
storage[:] = X' * (logistic(X*w) - y) + reg / m * w
end
out = optimize(J, g!, w, method = :l_bfgs,show_trace=true)
the error:
Iter Function value Gradient norm
...
19 -9.034225e+02 2.092807e+02
20 -9.034225e+02 2.092807e+02
21 -9.034225e+02 2.092807e+02
22 -9.034225e+02 2.092807e+02
23 -9.034225e+02 2.092807e+02
Linesearch failed to converge
while loading In[6], in expression starting on line 2
in hz_linesearch! at
/home/ryan/.julia/v0.3/Optim/src/linesearch/hz_linesearch.jl:374
in hz_linesearch! at
/home/ryan/.julia/v0.3/Optim/src/linesearch/hz_linesearch.jl:188
in l_bfgs at /home/ryan/.julia/v0.3/Optim/src/l_bfgs.jl:165
in optimize at /home/ryan/.julia/v0.3/Optim/src/optimize.jl:340
Perhaps I should override its convergence criteria? Or there's a bug in my
code? Anyway, I thought I might have more like with conjugate gradient descent,
so I included types.jl and cg.jl from the Optim package, and tried to make it
work too, defining a Differentiable Function type
function rosenbrock(g, x::Vector)
d1 = 1.0 - x[1]
d2 = x[2] - x[1]^2
if !(g === nothing)
g[1] = -2.0*d1 - 400.0*d2*x[1]
g[2] = 200.0*d2
end
val = d1^2 + 100.0 * d2^2
return val
end
function rosenbrock_gradient!(x::Vector, storage::Vector)
storage[1] = -2.0 * (1.0 - x[1]) - 400.0 * (x[2] - x[1]^2) * x[1]
storage[2] = 200.0 * (x[2] - x[1]^2)
end
cg(rosenbrock,[0,0])
d2 = DifferentiableFunction(rosenbrock,rosenbrock_gradient!)
cg(d2,[0,0])
ERROR: InexactError()
I tried a few variations on the function 'cg' before coming here for help.
I notice that there are a couple of other optimization packages out there
but this one is by JMW and looks good.
Obviously, if I just wanted to perform linear regression, I could just use
a built-in function, but to use more complex models, I would need to be
able to do gradient descent.
How have others fared with Optim? Any thoughts on what's going wrong?
General tips for how to make gradient descent work with Julia?