I started out by putting an '@time' macro call on the function that I 
figured was taking the most time, results looked like:
elapsed time: 8.429919506 seconds (4275452256 bytes allocated, 37.36% gc 
time)

... so lots of bytes being allocated. 

To get a better picture of where that was happening I tried running Julia 
with --track-allocation=user

And looking in the .mem file for the same function I had prefixed with 
@time, I see:

        - function 
fcq_clust(img::ImgHSV,ann::ANN.ArtificialNeuralNetwork,blend0::Matrix{Float32})
        0  img_hs_mean::Float32 = 0.5
        0  left::HSV{Float32}   = 
HSV(float32(0.0),float32(0.0),float32(0.0))
        0  center::HSV{Float32} = 
HSV(float32(0.0),float32(0.0),float32(0.0))
        0  right::HSV{Float32}  = 
HSV(float32(0.0),float32(0.0),float32(0.0))
      768  param_array::Vector{Float32} = Array(Float32,10)
        0  param_array[7] = img.s_mean 
        0  param_array[8] = img.v_mean
        0  param_array[9] = img.s_std  
        0  param_array[10]= img.v_std
        0  for y in 1:img.height
        0    @simd for x in 1:img.wid
        0      if 1 < x < img.wid
        0        @inbounds left   = img.data[x-1,y]
        0        @inbounds center = img.data[x,y]
        0        @inbounds right  = img.data[x+1,y]
        - 
        0        @inbounds param_array[1] = left.s
        0        @inbounds param_array[2] = center.s
        0        @inbounds param_array[3] = right.s
        0        @inbounds param_array[4] = left.v
        0        @inbounds param_array[5] = center.v
        0        @inbounds param_array[6] = right.v
        - 
        0        ANN.predict(ann,param_array)
        - 
        0        @inbounds blend0[x,y] = param_array[1]
        -      else
        0        @inbounds blend0[x,y] = img_hs_mean
        -      end
        -    end
        0  end
        0 end 


It looks pretty OK. But then I looked at the .mem file for the ANN.predict 
function:

        - function predict(ann::ArtificialNeuralNetwork,x::Vector{Float32})
        0     for i in 1:length(ann.layers)
        0         x = forward_propagate(ann.layers[i], x)
        -     end
        - end

Again, looks fine, but then I checked that forward_propagate function:

        - function forward_propagate(nl::NeuralLayer,x::Vector{Float32})
        0   nl.hx = x 
-1828754432   nl.pa = nl.b + nl.w * x
        0   nl.pr = tanh(nl.pa).*nl.scale 
        - end

Aha! Now we're getting somewhere. Apparently so much memory was allocated 
there that the counter overflowed and went negative(!)

The NeuralLayer type is defined as:

type NeuralLayer
    w::Matrix{Float32}   # weights
    cm::Matrix{Float32}  # connection matrix 
    b::Vector{Float32}   # biases
    scale::Vector{Float32}  #
    a_func::Symbol     # activation function
    hx::Vector{Float32}  # input values
    pa::Vector{Float32}  # pre activation values
    pr::Vector{Float32}  # predictions (activation values)
    frozen::Bool
end

Any ideas about reducing memory allocation in?:

    nl.pa = nl.b + nl.w * x

Phil






Reply via email to