Hi all,

I've just discovered Julia this last month, and have been greatly enjoying 
using it, especially because of its matlab-like linear algebra notation and 
all-round concise and intuitive syntax.

I've been playing with its optimisation functions, looking to implement 
gradient descent for logistic regression but I hit a couple of stumbling 
blocks, and was wondering how you've managed these.

Using Optim, I implemented regularized logistic regression with l_bfgs, and 
although it worked some times, when I stress-tested it with some k-fold 
validation, I got Linesearch errors.

I've got a dataset that's about 600 x 100 (m x n) with weights w and 
classes y.

my code:
  function J(w)        
    m,n = size(X)
    return sum(-y'*log(logistic(X*w)) - (1-y')*log(1-logistic(X*w))) + 
             reg/(2m) * sum(w.^2) # note normalizing bias weight
  end
    function g!(w,storage)
        storage[:] = X' * (logistic(X*w) - y) + reg / m * w
    end

    out = optimize(J, g!, w, method = :l_bfgs,show_trace=true)


the error:

Iter     Function value   Gradient norm 
...
    19    -9.034225e+02     2.092807e+02
    20    -9.034225e+02     2.092807e+02
    21    -9.034225e+02     2.092807e+02
    22    -9.034225e+02     2.092807e+02
    23    -9.034225e+02     2.092807e+02

Linesearch failed to converge
while loading In[6], in expression starting on line 2

 in hz_linesearch! at 
/home/ryan/.julia/v0.3/Optim/src/linesearch/hz_linesearch.jl:374
 in hz_linesearch! at 
/home/ryan/.julia/v0.3/Optim/src/linesearch/hz_linesearch.jl:188
 in l_bfgs at /home/ryan/.julia/v0.3/Optim/src/l_bfgs.jl:165
 in optimize at /home/ryan/.julia/v0.3/Optim/src/optimize.jl:340


Perhaps I should override its convergence criteria? Or there's a bug in my 
code? Anyway, I thought I might have more like with conjugate gradient descent, 
so I included types.jl and cg.jl from the Optim package, and tried to make it 
work too, defining a Differentiable Function type


function rosenbrock(g, x::Vector)

         d1 = 1.0 - x[1]

         d2 = x[2] - x[1]^2

         if !(g === nothing)

           g[1] = -2.0*d1 - 400.0*d2*x[1]

           g[2] = 200.0*d2

         end

         val = d1^2 + 100.0 * d2^2

         return val

       end


function rosenbrock_gradient!(x::Vector, storage::Vector)

           storage[1] = -2.0 * (1.0 - x[1]) - 400.0 * (x[2] - x[1]^2) * x[1]

           storage[2] = 200.0 * (x[2] - x[1]^2)

       end

 

 cg(rosenbrock,[0,0])


d2 = DifferentiableFunction(rosenbrock,rosenbrock_gradient!)

cg(d2,[0,0])

ERROR: InexactError()


I tried a few variations on the function 'cg' before coming here for help. 
I notice that there are a couple of other optimization packages out there 
but this one is by JMW and looks good.

Obviously, if I just wanted to perform linear regression, I could just use 
a built-in function, but to use more complex models, I would need to be 
able to do gradient descent.

How have others fared with Optim? Any thoughts on what's going wrong? 
General tips for how to make gradient descent work with Julia?



Reply via email to