‘Very’ new Julia user here. I wanted to highlight and maybe suggest a
modification to the Optim documentation. From the perspective of a new user
it isn’t really clear how to specify the gradient. I know the rosenbrock
function is the standard way to explain optimization docs but in my opinion
it would perhaps make life a little easier for people new to Julia to:
1- Have a more explicit example defining the gradient and reinforce the
concept of ‘storage’
2- Perhaps maybe have an example of an optimization that also shows how to
deal with optimization functions that accept multiple arguments since the
typical ‘args’ argument in missing from optimize
I think the above two features of Julia are unique enough that justify a
more than standard way of explaining optimization docs. I’m working through
a coursera machine learning course taught by Andrew Ng, and thought the
following problem would be a bit more explicit around how to define the
gradient.
By the way, Julia is great. In the process of learning it and switching
from a mix of R and Matlab. The following is just a crude translation of
some matlab code. (credit to someone on stackoverflow for helping with
solving my question around how to properly call the gradient)
using Optim
# function to optimize with theta a vector to be optimized
function costFunc_logistic(theta, X, y, lam)
m = length(y)
regularization = sum(theta[2:end].^2) * lam / (2 * m)
return sum( (-y .* log(sigmoid(X * theta)) - (1 - y) .* log(
1 - sigmoid(X * theta))) ) ./ m + regularization
end
# gradient definition
function costFunc_logistic_gradient!(theta, X, y, lam, m)
grad = X' * ( sigmoid(X * theta) .- y ) ./ m
grad[2:end] = grad[2:end] + theta[2:end] .* lam / m
return grad
end
f(theta::Array) = costFunc_logistic(theta, X, y, lam)
# gradient storage
function g!(theta::Array, storage::Array)
storage[:] = costFunc_logistic_gradient!(theta, X, y, lam,
m)
end
optimize(f, g!, theta, method = :l_bfgs)
Just thought I’d share my two cents.