‘Very’ new Julia user here. I wanted to highlight and maybe suggest a 
modification to the Optim documentation. From the perspective of a new user 
it isn’t really clear how to specify the gradient. I know the rosenbrock 
function is the standard way to explain optimization docs but in my opinion 
it would perhaps make life a little easier for people new to Julia to:

1- Have a more explicit example defining the gradient and reinforce the 
concept of ‘storage’

2- Perhaps maybe have an example of an optimization that also shows how to 
deal with optimization functions that accept multiple arguments since the 
typical ‘args’ argument in missing from optimize

I think the above two features of Julia are unique enough that justify a 
more than standard way of explaining optimization docs. I’m working through 
a coursera machine learning course taught by Andrew Ng, and thought the 
following problem would be a bit more explicit around how to define the 
gradient. 

By the way, Julia is great. In the process of learning it and switching 
from a mix of R and Matlab. The following is just a crude translation of 
some matlab code. (credit to someone on stackoverflow for helping with 
solving my question around how to properly call the gradient)

 
using Optim

 

# function to optimize with theta a vector to be optimized 
function costFunc_logistic(theta, X, y, lam) 
                m = length(y) 
                regularization = sum(theta[2:end].^2) * lam / (2 * m) 
                return sum( (-y .* log(sigmoid(X * theta)) - (1 - y) .* log(
1 - sigmoid(X * theta))) ) ./ m + regularization 
end 
 
# gradient definition 
function costFunc_logistic_gradient!(theta, X, y, lam, m) 
                grad = X' * ( sigmoid(X * theta) .- y ) ./ m 
                grad[2:end] = grad[2:end] + theta[2:end] .* lam / m 
                return grad 
end 

f(theta::Array) = costFunc_logistic(theta, X, y, lam) 

# gradient storage 
function g!(theta::Array, storage::Array) 
                storage[:] = costFunc_logistic_gradient!(theta, X, y, lam, 
m) 
end 

optimize(f, g!, theta, method = :l_bfgs)


 

 Just thought I’d share my two cents.

Reply via email to