Some Optim algorithms, like cg, already allow you to optimize a matrix. --Tim
On Wednesday, May 13, 2015 11:50:00 PM Lyndon White wrote: > Hi all, > I've been trinking about this for a while. > > Numerical Optimistation Libraries, eg > NLopt(https://github.com/JuliaOpt/NLopt.jl) and Optim > (https://github.com/JuliaOpt/Optim.jl), > require the parameter to be optimised (x), to be a vector. > > In Neural Networks, the paramer to be optimise are Weight Matrixes and and > Bias Vectors. > > The work around to train a Neural Network with such an optimistation > library is to Pack those matrixes and vectors down to single vector, when > returning the gradient, > and to unpack it into the matrixes and vectors when acted to evaluate the > gradient/loss. > > Like follows: > > type NN > > W_e::Matrix{Float64} > b_e::Vector{Float64} > W_d::Matrix{Float64} > b_d::Vector{Float64} > > end > > > function unpack!(nn::NN, θ::Vector) > W_e_len = length(nn.W_e) > b_e_len = length(nn.b_e) > W_d_len = length(nn.W_d) > b_d_len = length(nn.b_d) > W_e_shape = size(nn.W_e) > W_d_shape = size(nn.W_d) > > nn.W_e = reshape(θ[1: W_e_len],W_e_shape) > nn.b_e = θ[W_e_len+1: W_e_len+b_e_len] > nn.W_d = reshape(θ[W_e_len+b_e_len+1: W_e_len+b_e_len+W_d_len],W_d_shape > ) > nn.b_d = θ[W_e_len+b_e_len+W_d_len+1: end] > > nn > end > > function pack(nn::NN) > pack(nn.W_e[:],nn.b_e, nn.W_d[:],nn.b_d[:]] _ > end > > pack(∇W_e::Matrix{Float64}, ∇b_e::Vector{Float64}, ∇W_d::Matrix{Float64}, ∇ > b_d::Vector{Float64}) > [∇W_e[:], ∇b_e, ∇W_d[:], ∇b_d] > end > > > > > Then use it like: > > function loss_and_loss_grad!(θ::Vector, grad::Vector) #NLOpt and Optim > both provide the grad matrix to be overwritten in place > grad[:] = 0 > unpack!(nn_outer, θ) #Keep a global nn to track size, (and handy if the > algorithm crashes) > > > function loss_and_loss_grad(train_datum) > ∇W_e, ∇b_e, ∇W_d, ∇b_d, err = loss_and_loss_grad_single(nn_outer, > train_datum) > [pack(∇W_e, ∇b_e, ∇W_d, ∇b_d), err] > end > > ret = map(loss_and_loss_grad, training_data)|> sum > grad[:] = ret[1:end-1] > err=ret[end] > > grad[:]/=length(training_data) > err/=length(training_data) > err > end > > > > > > This works. > But in involved excessive array copies (I suspect). > > The order in the packed vector does not matter, so long as it is consistent. > > Now, memory is already linear -- matrices are Vectors in memory with > special operations defined that say how to interpret them in 2D. > and the matrixes in the composite type, are adjacent in memory (i assume, > since why not be like a C struct). > > So it is logically simple to just reinterpret them them as a single vector. > I don't think reinterpet functions on composite types though. > > In C of PL/I, this could be solved by defining the Composite type as an > untagged union, of a Vector and a Structure. > I don't think Julia has this facility. (It is pretty niche, this is one of > the only times i can think of it as being actually convenient). > > > Anyone have any suggestions? > > Regards
