Hi all,
I've been trinking about this for a while.
Numerical Optimistation Libraries, eg
NLopt(https://github.com/JuliaOpt/NLopt.jl) and Optim
(https://github.com/JuliaOpt/Optim.jl),
require the parameter to be optimised (x), to be a vector.
In Neural Networks, the paramer to be optimise are Weight Matrixes and and
Bias Vectors.
The work around to train a Neural Network with such an optimistation
library is to Pack those matrixes and vectors down to single vector, when
returning the gradient,
and to unpack it into the matrixes and vectors when acted to evaluate the
gradient/loss.
Like follows:
type NN
W_e::Matrix{Float64}
b_e::Vector{Float64}
W_d::Matrix{Float64}
b_d::Vector{Float64}
end
function unpack!(nn::NN, θ::Vector)
W_e_len = length(nn.W_e)
b_e_len = length(nn.b_e)
W_d_len = length(nn.W_d)
b_d_len = length(nn.b_d)
W_e_shape = size(nn.W_e)
W_d_shape = size(nn.W_d)
nn.W_e = reshape(θ[1: W_e_len],W_e_shape)
nn.b_e = θ[W_e_len+1: W_e_len+b_e_len]
nn.W_d = reshape(θ[W_e_len+b_e_len+1: W_e_len+b_e_len+W_d_len],W_d_shape
)
nn.b_d = θ[W_e_len+b_e_len+W_d_len+1: end]
nn
end
function pack(nn::NN)
pack(nn.W_e[:],nn.b_e, nn.W_d[:],nn.b_d[:]] _
end
pack(∇W_e::Matrix{Float64}, ∇b_e::Vector{Float64}, ∇W_d::Matrix{Float64}, ∇
b_d::Vector{Float64})
[∇W_e[:], ∇b_e, ∇W_d[:], ∇b_d]
end
Then use it like:
function loss_and_loss_grad!(θ::Vector, grad::Vector) #NLOpt and Optim
both provide the grad matrix to be overwritten in place
grad[:] = 0
unpack!(nn_outer, θ) #Keep a global nn to track size, (and handy if the
algorithm crashes)
function loss_and_loss_grad(train_datum)
∇W_e, ∇b_e, ∇W_d, ∇b_d, err = loss_and_loss_grad_single(nn_outer,
train_datum)
[pack(∇W_e, ∇b_e, ∇W_d, ∇b_d), err]
end
ret = map(loss_and_loss_grad, training_data)|> sum
grad[:] = ret[1:end-1]
err=ret[end]
grad[:]/=length(training_data)
err/=length(training_data)
err
end
This works.
But in involved excessive array copies (I suspect).
The order in the packed vector does not matter, so long as it is consistent.
Now, memory is already linear -- matrices are Vectors in memory with
special operations defined that say how to interpret them in 2D.
and the matrixes in the composite type, are adjacent in memory (i assume,
since why not be like a C struct).
So it is logically simple to just reinterpret them them as a single vector.
I don't think reinterpet functions on composite types though.
In C of PL/I, this could be solved by defining the Composite type as an
untagged union, of a Vector and a Structure.
I don't think Julia has this facility. (It is pretty niche, this is one of
the only times i can think of it as being actually convenient).
Anyone have any suggestions?
Regards