Here's my two, not very thorough, cents:

(1) The odds of a bug in Optim.jl are very high (>90%).
(2) The odds of a bug in your code are very high (>90%).

It's pretty easy to make a decision about (2). Deciding on (1) is a lot 
harder, since you need a specific optimization that Optim should solve, but 
fails to solve.

For resolving (2), you have a couple of sub-problems:

(a) Is your gradient analytically correct? You can check this by comparing 
it with finite differencing. If it doesn't produce a close match, be 
suspicious.
(b) Is your log likelihood + gradient numerically correct? Your stress test 
is, in theory, an attempt to test this. But numerical instability implies 
that the problem only occurs when the problem is likely to be numerically 
unstable. So you'd want to measure the correlation between the difficulty 
of the problem and the probability of failure.

My experience is that the Optim error messages don't make it easy to 
realize when you've made a mistake in your gradients. This is being worked 
on at the moment, but I think someone would need to dedicate a week to 
working on this to get us to a point where the error messages are always 
clear.

 -- John

On Sunday, February 15, 2015 at 3:29:35 PM UTC-8, Ryan Carey wrote:
>
> Hi all,
>
> I've just discovered Julia this last month, and have been greatly enjoying 
> using it, especially because of its matlab-like linear algebra notation and 
> all-round concise and intuitive syntax.
>
> I've been playing with its optimisation functions, looking to implement 
> gradient descent for logistic regression but I hit a couple of stumbling 
> blocks, and was wondering how you've managed these.
>
> Using Optim, I implemented regularized logistic regression with l_bfgs, 
> and although it worked some times, when I stress-tested it with some k-fold 
> validation, I got Linesearch errors.
>
> I've got a dataset that's about 600 x 100 (m x n) with weights w and 
> classes y.
>
> my code:
>   function J(w)        
>     m,n = size(X)
>     return sum(-y'*log(logistic(X*w)) - (1-y')*log(1-logistic(X*w))) + 
>              reg/(2m) * sum(w.^2) # note normalizing bias weight
>   end
>     function g!(w,storage)
>         storage[:] = X' * (logistic(X*w) - y) + reg / m * w
>     end
>
>     out = optimize(J, g!, w, method = :l_bfgs,show_trace=true)
>
>
> the error:
>
> Iter     Function value   Gradient norm 
> ...
>     19    -9.034225e+02     2.092807e+02
>     20    -9.034225e+02     2.092807e+02
>     21    -9.034225e+02     2.092807e+02
>     22    -9.034225e+02     2.092807e+02
>     23    -9.034225e+02     2.092807e+02
>
> Linesearch failed to converge
> while loading In[6], in expression starting on line 2
>
>  in hz_linesearch! at 
> /home/ryan/.julia/v0.3/Optim/src/linesearch/hz_linesearch.jl:374
>  in hz_linesearch! at 
> /home/ryan/.julia/v0.3/Optim/src/linesearch/hz_linesearch.jl:188
>  in l_bfgs at /home/ryan/.julia/v0.3/Optim/src/l_bfgs.jl:165
>  in optimize at /home/ryan/.julia/v0.3/Optim/src/optimize.jl:340
>
>
> Perhaps I should override its convergence criteria? Or there's a bug in my 
> code? Anyway, I thought I might have more like with conjugate gradient 
> descent, so I included types.jl and cg.jl from the Optim package, and tried 
> to make it work too, defining a Differentiable Function type
>
>
> function rosenbrock(g, x::Vector)
>
>          d1 = 1.0 - x[1]
>
>          d2 = x[2] - x[1]^2
>
>          if !(g === nothing)
>
>            g[1] = -2.0*d1 - 400.0*d2*x[1]
>
>            g[2] = 200.0*d2
>
>          end
>
>          val = d1^2 + 100.0 * d2^2
>
>          return val
>
>        end
>
>
> function rosenbrock_gradient!(x::Vector, storage::Vector)
>
>            storage[1] = -2.0 * (1.0 - x[1]) - 400.0 * (x[2] - x[1]^2) * x[1]
>
>            storage[2] = 200.0 * (x[2] - x[1]^2)
>
>        end
>
>  
>
>  cg(rosenbrock,[0,0])
>
>
> d2 = DifferentiableFunction(rosenbrock,rosenbrock_gradient!)
>
> cg(d2,[0,0])
>
> ERROR: InexactError()
>
>
> I tried a few variations on the function 'cg' before coming here for help. 
> I notice that there are a couple of other optimization packages out there 
> but this one is by JMW and looks good.
>
> Obviously, if I just wanted to perform linear regression, I could just use 
> a built-in function, but to use more complex models, I would need to be 
> able to do gradient descent.
>
> How have others fared with Optim? Any thoughts on what's going wrong? 
> General tips for how to make gradient descent work with Julia?
>
>
>
>

Reply via email to