In the long-term, the best way to do this will be to use SubArray and 
ReshapeArray. You'll allocate enough space for all parameters, then unpack 
them into separate objects when that helps.

 -- John

On Thursday, May 14, 2015 at 2:03:27 AM UTC-7, Tim Holy wrote:
>
> Some Optim algorithms, like cg, already allow you to optimize a matrix. 
>
> --Tim 
>
> On Wednesday, May 13, 2015 11:50:00 PM Lyndon White wrote: 
> > Hi all, 
> > I've been trinking about this for a while. 
> > 
> > Numerical Optimistation Libraries, eg 
> > NLopt(https://github.com/JuliaOpt/NLopt.jl) and Optim 
> > (https://github.com/JuliaOpt/Optim.jl), 
> > require the parameter to be optimised (x), to be a vector. 
> > 
> > In Neural Networks, the paramer to be optimise are Weight Matrixes and 
> and 
> > Bias Vectors. 
> > 
> > The work around to train a Neural Network with such an optimistation 
> > library is to Pack those matrixes and vectors down to single vector, 
> when 
> > returning the gradient, 
> > and to unpack it into the matrixes and vectors when acted to evaluate 
> the 
> > gradient/loss. 
> > 
> > Like follows: 
> > 
> > type NN 
> > 
> >     W_e::Matrix{Float64} 
> >     b_e::Vector{Float64} 
> >     W_d::Matrix{Float64} 
> >     b_d::Vector{Float64} 
> > 
> > end 
> > 
> > 
> > function unpack!(nn::NN, θ::Vector) 
> >     W_e_len = length(nn.W_e) 
> >     b_e_len = length(nn.b_e) 
> >     W_d_len = length(nn.W_d) 
> >     b_d_len = length(nn.b_d) 
> >     W_e_shape = size(nn.W_e) 
> >     W_d_shape = size(nn.W_d) 
> > 
> >     nn.W_e = reshape(θ[1: W_e_len],W_e_shape) 
> >     nn.b_e = θ[W_e_len+1: W_e_len+b_e_len] 
> >     nn.W_d = reshape(θ[W_e_len+b_e_len+1: 
> W_e_len+b_e_len+W_d_len],W_d_shape 
> > ) 
> >     nn.b_d = θ[W_e_len+b_e_len+W_d_len+1: end] 
> > 
> >     nn 
> > end 
> > 
> > function pack(nn::NN) 
> >     pack(nn.W_e[:],nn.b_e, nn.W_d[:],nn.b_d[:]] _ 
> > end 
> > 
> > pack(∇W_e::Matrix{Float64}, ∇b_e::Vector{Float64}, 
> ∇W_d::Matrix{Float64}, ∇ 
> > b_d::Vector{Float64}) 
> >     [∇W_e[:], ∇b_e, ∇W_d[:], ∇b_d] 
> > end 
> > 
> > 
> > 
> > 
> > Then use it like: 
> > 
> > function loss_and_loss_grad!(θ::Vector, grad::Vector)   #NLOpt and Optim 
> > both provide the grad matrix  to be overwritten in place 
> >     grad[:] = 0 
> >     unpack!(nn_outer, θ) #Keep a global nn to track size, (and handy if 
> the 
> > algorithm crashes) 
> > 
> > 
> >     function loss_and_loss_grad(train_datum) 
> >         ∇W_e, ∇b_e, ∇W_d, ∇b_d, err = 
> loss_and_loss_grad_single(nn_outer, 
> > train_datum) 
> >         [pack(∇W_e, ∇b_e, ∇W_d, ∇b_d), err] 
> >     end 
> > 
> >     ret = map(loss_and_loss_grad, training_data)|> sum 
> >     grad[:] = ret[1:end-1] 
> >     err=ret[end] 
> > 
> >     grad[:]/=length(training_data) 
> >     err/=length(training_data) 
> >     err 
> > end 
> > 
> > 
> > 
> > 
> > 
> > This works. 
> > But in involved excessive array copies (I suspect). 
> > 
> > The order in the packed vector does not matter, so long as it is 
> consistent. 
> > 
> > Now, memory is already linear -- matrices are Vectors in memory with 
> > special operations defined that say how to interpret them in 2D. 
> > and the matrixes in the composite type, are adjacent in memory (i 
> assume, 
> > since why not be like a C struct). 
> > 
> > So it is logically simple to just reinterpret them them as a single 
> vector. 
> > I don't think reinterpet functions on composite types though. 
> > 
> > In C of PL/I, this could be solved by defining the Composite type as an 
> > untagged union, of a Vector and a Structure. 
> > I don't think Julia has this facility. (It is pretty niche, this is one 
> of 
> > the only times i can think of it as being actually convenient). 
> > 
> > 
> > Anyone have any suggestions? 
> > 
> > Regards 
>
>

Reply via email to