Hi all,
I've been trinking about this for a while.

Numerical Optimistation Libraries, eg 
NLopt(https://github.com/JuliaOpt/NLopt.jl) and Optim 
(https://github.com/JuliaOpt/Optim.jl),
require the parameter to be optimised (x), to be a vector.

In Neural Networks, the paramer to be optimise are Weight Matrixes and and 
Bias Vectors.

The work around to train a Neural Network with such an optimistation 
library is to Pack those matrixes and vectors down to single vector, when 
returning the gradient, 
and to unpack it into the matrixes and vectors when acted to evaluate the 
gradient/loss.

Like follows:

type NN
   
    W_e::Matrix{Float64}
    b_e::Vector{Float64}
    W_d::Matrix{Float64}
    b_d::Vector{Float64}
   
end


function unpack!(nn::NN, θ::Vector)
    W_e_len = length(nn.W_e)
    b_e_len = length(nn.b_e)
    W_d_len = length(nn.W_d)
    b_d_len = length(nn.b_d)
    W_e_shape = size(nn.W_e)
    W_d_shape = size(nn.W_d)
    
    nn.W_e = reshape(θ[1: W_e_len],W_e_shape)
    nn.b_e = θ[W_e_len+1: W_e_len+b_e_len]
    nn.W_d = reshape(θ[W_e_len+b_e_len+1: W_e_len+b_e_len+W_d_len],W_d_shape
)
    nn.b_d = θ[W_e_len+b_e_len+W_d_len+1: end]
    
    nn
end

function pack(nn::NN)
    pack(nn.W_e[:],nn.b_e, nn.W_d[:],nn.b_d[:]] _
end

pack(∇W_e::Matrix{Float64}, ∇b_e::Vector{Float64}, ∇W_d::Matrix{Float64}, ∇
b_d::Vector{Float64})
    [∇W_e[:], ∇b_e, ∇W_d[:], ∇b_d] 
end




Then use it like:

function loss_and_loss_grad!(θ::Vector, grad::Vector)   #NLOpt and Optim 
both provide the grad matrix  to be overwritten in place
    grad[:] = 0
    unpack!(nn_outer, θ) #Keep a global nn to track size, (and handy if the 
algorithm crashes)
    
    
    function loss_and_loss_grad(train_datum)
        ∇W_e, ∇b_e, ∇W_d, ∇b_d, err = loss_and_loss_grad_single(nn_outer, 
train_datum)
        [pack(∇W_e, ∇b_e, ∇W_d, ∇b_d), err]
    end
    
    ret = map(loss_and_loss_grad, training_data)|> sum 
    grad[:] = ret[1:end-1]
    err=ret[end]
    
    grad[:]/=length(training_data)
    err/=length(training_data)
    err
end





This works.
But in involved excessive array copies (I suspect).

The order in the packed vector does not matter, so long as it is consistent.

Now, memory is already linear -- matrices are Vectors in memory with 
special operations defined that say how to interpret them in 2D.
and the matrixes in the composite type, are adjacent in memory (i assume, 
since why not be like a C struct).

So it is logically simple to just reinterpret them them as a single vector.
I don't think reinterpet functions on composite types though.

In C of PL/I, this could be solved by defining the Composite type as an 
untagged union, of a Vector and a Structure.
I don't think Julia has this facility. (It is pretty niche, this is one of 
the only times i can think of it as being actually convenient).


Anyone have any suggestions?

Regards

Reply via email to