In the long-term, the best way to do this will be to use SubArray and ReshapeArray. You'll allocate enough space for all parameters, then unpack them into separate objects when that helps.
-- John On Thursday, May 14, 2015 at 2:03:27 AM UTC-7, Tim Holy wrote: > > Some Optim algorithms, like cg, already allow you to optimize a matrix. > > --Tim > > On Wednesday, May 13, 2015 11:50:00 PM Lyndon White wrote: > > Hi all, > > I've been trinking about this for a while. > > > > Numerical Optimistation Libraries, eg > > NLopt(https://github.com/JuliaOpt/NLopt.jl) and Optim > > (https://github.com/JuliaOpt/Optim.jl), > > require the parameter to be optimised (x), to be a vector. > > > > In Neural Networks, the paramer to be optimise are Weight Matrixes and > and > > Bias Vectors. > > > > The work around to train a Neural Network with such an optimistation > > library is to Pack those matrixes and vectors down to single vector, > when > > returning the gradient, > > and to unpack it into the matrixes and vectors when acted to evaluate > the > > gradient/loss. > > > > Like follows: > > > > type NN > > > > W_e::Matrix{Float64} > > b_e::Vector{Float64} > > W_d::Matrix{Float64} > > b_d::Vector{Float64} > > > > end > > > > > > function unpack!(nn::NN, θ::Vector) > > W_e_len = length(nn.W_e) > > b_e_len = length(nn.b_e) > > W_d_len = length(nn.W_d) > > b_d_len = length(nn.b_d) > > W_e_shape = size(nn.W_e) > > W_d_shape = size(nn.W_d) > > > > nn.W_e = reshape(θ[1: W_e_len],W_e_shape) > > nn.b_e = θ[W_e_len+1: W_e_len+b_e_len] > > nn.W_d = reshape(θ[W_e_len+b_e_len+1: > W_e_len+b_e_len+W_d_len],W_d_shape > > ) > > nn.b_d = θ[W_e_len+b_e_len+W_d_len+1: end] > > > > nn > > end > > > > function pack(nn::NN) > > pack(nn.W_e[:],nn.b_e, nn.W_d[:],nn.b_d[:]] _ > > end > > > > pack(∇W_e::Matrix{Float64}, ∇b_e::Vector{Float64}, > ∇W_d::Matrix{Float64}, ∇ > > b_d::Vector{Float64}) > > [∇W_e[:], ∇b_e, ∇W_d[:], ∇b_d] > > end > > > > > > > > > > Then use it like: > > > > function loss_and_loss_grad!(θ::Vector, grad::Vector) #NLOpt and Optim > > both provide the grad matrix to be overwritten in place > > grad[:] = 0 > > unpack!(nn_outer, θ) #Keep a global nn to track size, (and handy if > the > > algorithm crashes) > > > > > > function loss_and_loss_grad(train_datum) > > ∇W_e, ∇b_e, ∇W_d, ∇b_d, err = > loss_and_loss_grad_single(nn_outer, > > train_datum) > > [pack(∇W_e, ∇b_e, ∇W_d, ∇b_d), err] > > end > > > > ret = map(loss_and_loss_grad, training_data)|> sum > > grad[:] = ret[1:end-1] > > err=ret[end] > > > > grad[:]/=length(training_data) > > err/=length(training_data) > > err > > end > > > > > > > > > > > > This works. > > But in involved excessive array copies (I suspect). > > > > The order in the packed vector does not matter, so long as it is > consistent. > > > > Now, memory is already linear -- matrices are Vectors in memory with > > special operations defined that say how to interpret them in 2D. > > and the matrixes in the composite type, are adjacent in memory (i > assume, > > since why not be like a C struct). > > > > So it is logically simple to just reinterpret them them as a single > vector. > > I don't think reinterpet functions on composite types though. > > > > In C of PL/I, this could be solved by defining the Composite type as an > > untagged union, of a Vector and a Structure. > > I don't think Julia has this facility. (It is pretty niche, this is one > of > > the only times i can think of it as being actually convenient). > > > > > > Anyone have any suggestions? > > > > Regards > >
