That's not quite right: the highest performance case requires that you have three separate functions:
(1) f, which only calculates the objective function as a pure function returning a Float64 value (2) g! which only calculates the gradient by mutating a Float64 array (3) fg!, which calculates both f and g! at the same. At some point, I'd like to revise the Optim interface so that you provide a single function that decides whether to calculate a gradient based on the size (zero or non-zero) of the input array for the gradient. This is the interface for NLopt, so it would be nice that functions could be moved between the two libraries easily. Looking at your sample code, I think the real performance gains require that you do some devectorization. Optim, for example, assumes that you're going to mutate your gradient vector, rather than reallocate on every iteration (as your DeltaW implies). -- John On Aug 13, 2014, at 5:30 PM, Andre P. <[email protected]> wrote: > I'm porting some Matlab code to Julia. > > The optimization objective function evaluates the main cost function and its > gradient simultaneously. Some of the interim calculations from the cost > function are plugged into to gradient calculation to avoid making the same > calculation twice. Here is the actual function. > > function SparseFilteringObj (W, X, N) > > # Reshape W into matrix form > W = reshape(W, (N, size(X,1))) > > # Feed Forward > F = W * X # Linear Activation > Fs = sqrt(F.^2 + 1e-8) # Soft-Absolute Activation > NFs, L2Fs = l2row(Fs) # Normalize by Rows > Fhat, L2Fn = l2row(NFs') # Normalize by Columns > > # Compute Objective Function > Obj = sum(sum(Fhat, 2), 1) > > # Backprop through each feedforward step > DeltaW = l2grad(NFs', Fhat, L2Fn, ones(size(Fhat))) > DeltaW = l2grad(Fs, NFs, L2Fs, DeltaW') > DeltaW = (DeltaW .* (F ./ Fs)) * X' > DeltaW = DeltaW[:] > > return Obj, DeltaW > end > > > This is my first time using Optim.jl. It seems the interface requires that > the objective function be separated into a cost and a gradient function, but > it also says that I can get better performance by providing a third function, > DifferentiableFunction(f,g!), that calculates both of these simultaneously. > So, as I understand it, I have to split them up, and then re-combine them > using DifferentiableFunction(f,g!) to get better performance. Is this > correct? > > Any suggestions on how to split this in a way that avoids duplicating > calculation? Do all the calculation that are shared as inline calcs perhaps? > It feels like I missing some easy solution. Any advice would be appreciated. > > Gist of the as yet incomplete port: > https://gist.github.com/Andy-P/5c88e524d46a3749ba5f > > Original matlab code > http://cs.stanford.edu/~jngiam/papers/NgiamKohChenBhaskarNg2011_Supplementary.pdf >
