I started out by putting an '@time' macro call on the function that I
figured was taking the most time, results looked like:
elapsed time: 8.429919506 seconds (4275452256 bytes allocated, 37.36% gc
time)
... so lots of bytes being allocated.
To get a better picture of where that was happening I tried running Julia
with --track-allocation=user
And looking in the .mem file for the same function I had prefixed with
@time, I see:
- function
fcq_clust(img::ImgHSV,ann::ANN.ArtificialNeuralNetwork,blend0::Matrix{Float32})
0 img_hs_mean::Float32 = 0.5
0 left::HSV{Float32} =
HSV(float32(0.0),float32(0.0),float32(0.0))
0 center::HSV{Float32} =
HSV(float32(0.0),float32(0.0),float32(0.0))
0 right::HSV{Float32} =
HSV(float32(0.0),float32(0.0),float32(0.0))
768 param_array::Vector{Float32} = Array(Float32,10)
0 param_array[7] = img.s_mean
0 param_array[8] = img.v_mean
0 param_array[9] = img.s_std
0 param_array[10]= img.v_std
0 for y in 1:img.height
0 @simd for x in 1:img.wid
0 if 1 < x < img.wid
0 @inbounds left = img.data[x-1,y]
0 @inbounds center = img.data[x,y]
0 @inbounds right = img.data[x+1,y]
-
0 @inbounds param_array[1] = left.s
0 @inbounds param_array[2] = center.s
0 @inbounds param_array[3] = right.s
0 @inbounds param_array[4] = left.v
0 @inbounds param_array[5] = center.v
0 @inbounds param_array[6] = right.v
-
0 ANN.predict(ann,param_array)
-
0 @inbounds blend0[x,y] = param_array[1]
- else
0 @inbounds blend0[x,y] = img_hs_mean
- end
- end
0 end
0 end
It looks pretty OK. But then I looked at the .mem file for the ANN.predict
function:
- function predict(ann::ArtificialNeuralNetwork,x::Vector{Float32})
0 for i in 1:length(ann.layers)
0 x = forward_propagate(ann.layers[i], x)
- end
- end
Again, looks fine, but then I checked that forward_propagate function:
- function forward_propagate(nl::NeuralLayer,x::Vector{Float32})
0 nl.hx = x
-1828754432 nl.pa = nl.b + nl.w * x
0 nl.pr = tanh(nl.pa).*nl.scale
- end
Aha! Now we're getting somewhere. Apparently so much memory was allocated
there that the counter overflowed and went negative(!)
The NeuralLayer type is defined as:
type NeuralLayer
w::Matrix{Float32} # weights
cm::Matrix{Float32} # connection matrix
b::Vector{Float32} # biases
scale::Vector{Float32} #
a_func::Symbol # activation function
hx::Vector{Float32} # input values
pa::Vector{Float32} # pre activation values
pr::Vector{Float32} # predictions (activation values)
frozen::Bool
end
Any ideas about reducing memory allocation in?:
nl.pa = nl.b + nl.w * x
Phil