Some Optim algorithms, like cg, already allow you to optimize a matrix.

--Tim

On Wednesday, May 13, 2015 11:50:00 PM Lyndon White wrote:
> Hi all,
> I've been trinking about this for a while.
> 
> Numerical Optimistation Libraries, eg
> NLopt(https://github.com/JuliaOpt/NLopt.jl) and Optim
> (https://github.com/JuliaOpt/Optim.jl),
> require the parameter to be optimised (x), to be a vector.
> 
> In Neural Networks, the paramer to be optimise are Weight Matrixes and and
> Bias Vectors.
> 
> The work around to train a Neural Network with such an optimistation
> library is to Pack those matrixes and vectors down to single vector, when
> returning the gradient,
> and to unpack it into the matrixes and vectors when acted to evaluate the
> gradient/loss.
> 
> Like follows:
> 
> type NN
> 
>     W_e::Matrix{Float64}
>     b_e::Vector{Float64}
>     W_d::Matrix{Float64}
>     b_d::Vector{Float64}
> 
> end
> 
> 
> function unpack!(nn::NN, θ::Vector)
>     W_e_len = length(nn.W_e)
>     b_e_len = length(nn.b_e)
>     W_d_len = length(nn.W_d)
>     b_d_len = length(nn.b_d)
>     W_e_shape = size(nn.W_e)
>     W_d_shape = size(nn.W_d)
> 
>     nn.W_e = reshape(θ[1: W_e_len],W_e_shape)
>     nn.b_e = θ[W_e_len+1: W_e_len+b_e_len]
>     nn.W_d = reshape(θ[W_e_len+b_e_len+1: W_e_len+b_e_len+W_d_len],W_d_shape
> )
>     nn.b_d = θ[W_e_len+b_e_len+W_d_len+1: end]
> 
>     nn
> end
> 
> function pack(nn::NN)
>     pack(nn.W_e[:],nn.b_e, nn.W_d[:],nn.b_d[:]] _
> end
> 
> pack(∇W_e::Matrix{Float64}, ∇b_e::Vector{Float64}, ∇W_d::Matrix{Float64}, ∇
> b_d::Vector{Float64})
>     [∇W_e[:], ∇b_e, ∇W_d[:], ∇b_d]
> end
> 
> 
> 
> 
> Then use it like:
> 
> function loss_and_loss_grad!(θ::Vector, grad::Vector)   #NLOpt and Optim
> both provide the grad matrix  to be overwritten in place
>     grad[:] = 0
>     unpack!(nn_outer, θ) #Keep a global nn to track size, (and handy if the
> algorithm crashes)
> 
> 
>     function loss_and_loss_grad(train_datum)
>         ∇W_e, ∇b_e, ∇W_d, ∇b_d, err = loss_and_loss_grad_single(nn_outer,
> train_datum)
>         [pack(∇W_e, ∇b_e, ∇W_d, ∇b_d), err]
>     end
> 
>     ret = map(loss_and_loss_grad, training_data)|> sum
>     grad[:] = ret[1:end-1]
>     err=ret[end]
> 
>     grad[:]/=length(training_data)
>     err/=length(training_data)
>     err
> end
> 
> 
> 
> 
> 
> This works.
> But in involved excessive array copies (I suspect).
> 
> The order in the packed vector does not matter, so long as it is consistent.
> 
> Now, memory is already linear -- matrices are Vectors in memory with
> special operations defined that say how to interpret them in 2D.
> and the matrixes in the composite type, are adjacent in memory (i assume,
> since why not be like a C struct).
> 
> So it is logically simple to just reinterpret them them as a single vector.
> I don't think reinterpet functions on composite types though.
> 
> In C of PL/I, this could be solved by defining the Composite type as an
> untagged union, of a Vector and a Structure.
> I don't think Julia has this facility. (It is pretty niche, this is one of
> the only times i can think of it as being actually convenient).
> 
> 
> Anyone have any suggestions?
> 
> Regards

Reply via email to