Reposting this from Gitter chat since it seems this is more active. I'm writing a GloVe module to learn Julia.
How can I avoid memory allocations? My main function deals with a lot of random indexing in Matrices. A[i, :] = 0.5 * B[i, :] In this case* i* isn't from a linear sequence. I'm not sure that matters. Anyway, I've done analysis and I know B[i, :] is the issue here since it's creating a copy. https://github.com/JuliaLang/julia/blob/master/base/array.jl#L309 makes the copy I tried to do it via loop but it looks like that doesn't help either. In fact, it seems to allocate slight more memory which seems really odd. Here's some of the code, it's a little messy since I'm commenting different approaches I'm trying out. type Model{T} W_main::Matrix{T} W_ctx::Matrix{T} b_main::Vector{T} b_ctx::Vector{T} W_main_grad::Matrix{T} W_ctx_grad::Matrix{T} b_main_grad::Vector{T} b_ctx_grad::Vector{T} covec::Vector{Cooccurence} end # Each vocab word in associated with a main vector and a context vector. # The paper initializes the to values [-0.5, 0.5] / vecsize+1 and # the gradients to 1.0. # # The +1 term is for the bias. function Model(comatrix; vecsize=100) vs = size(comatrix, 1) Model( (rand(vecsize, vs) - 0.5) / (vecsize + 1), (rand(vecsize, vs) - 0.5) / (vecsize + 1), (rand(vs) - 0.5) / (vecsize + 1), (rand(vs) - 0.5) / (vecsize + 1), ones(vecsize, vs), ones(vecsize, vs), ones(vs), ones(vs), CoVector(comatrix), # not required in 0.4 ) end # TODO: figure out memory issue # the memory comments are from 500 loop test with vecsize=100 function train!(m::Model, s::Adagrad; xmax=100, alpha=0.75) J = 0.0 shuffle!(m.covec) vecsize = size(m.W_main, 1) eltype = typeof(m.b_main[1]) vm = zeros(eltype, vecsize) vc = zeros(eltype, vecsize) grad_main = zeros(eltype, vecsize) grad_ctx = zeros(eltype, vecsize) for n=1:s.niter # shuffle indices for i = 1:length(m.covec) @inbounds l1 = m.covec[i].i # main index @inbounds l2 = m.covec[i].j # context index @inbounds v = m.covec[i].v vm[:] = m.W_main[:, l1] vc[:] = m.W_ctx[:, l2] diff = dot(vec(vm), vec(vc)) + m.b_main[l1] + m.b_ctx[l2] - log(v) fdiff = ifelse(v < xmax, (v / xmax) ^ alpha, 1.0) * diff J += 0.5 * fdiff * diff fdiff *= s.lrate # inc memory by ~200 MB && running time by 2x grad_main[:] = fdiff * m.W_ctx[:, l2] grad_ctx[:] = fdiff * m.W_main[:, l1] # Adaptive learning # inc ~ 600MB + 0.75s #= @inbounds for ii = 1:vecsize =# #= m.W_main[ii, l1] -= grad_main[ii] / sqrt(m.W_main_grad[ii, l1]) =# #= m.W_ctx[ii, l2] -= grad_ctx[ii] / sqrt(m.W_ctx_grad[ii, l2]) =# #= m.b_main[l1] -= fdiff ./ sqrt(m.b_main_grad[l1]) =# #= m.b_ctx[l2] -= fdiff ./ sqrt(m.b_ctx_grad[l2]) =# #= end =# m.W_main[:, l1] -= grad_main ./ sqrt(m.W_main_grad[:, l1]) m.W_ctx[:, l2] -= grad_ctx ./ sqrt(m.W_ctx_grad[:, l2]) m.b_main[l1] -= fdiff ./ sqrt(m.b_main_grad[l1]) m.b_ctx[l2] -= fdiff ./ sqrt(m.b_ctx_grad[l2]) # Gradients fdiff *= fdiff m.W_main_grad[:, l1] += grad_main .^ 2 m.W_ctx_grad[:, l2] += grad_ctx .^ 2 m.b_main_grad[l1] += fdiff m.b_ctx_grad[l2] += fdiff end #= if n % 10 == 0 =# #= println("iteration $n, cost $J") =# #= end =# end end Here's the entire repo https://github.com/domluna/GloVe.jl. Might be helpful. I tried doing some loops but it allocates more memory (oddly enough) and gets slower. You'll notice the word vectors are indexed by column, I changed the representation to that seeing if it would make a difference during the loop. It didn't seem to. The memory analysis showed Julia Version 0.4.0-dev+4893 Commit eb5da26* (2015-05-19 11:51 UTC) Platform Info: System: Darwin (x86_64-apple-darwin14.4.0) CPU: Intel(R) Core(TM) i5-2557M CPU @ 1.70GHz WORD_SIZE: 64 BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Sandybridge) LAPACK: libopenblas LIBM: libopenlibm LLVM: libLLVM-3.3 Here model consists of 100x19 Matrices and 100 element vectors, 19 words in the vocab, 100 element word vector. @time GloVe.train!(model, GloVe.Adagrad(500)) 1.990 seconds (6383 k allocations: 1162 MB, 10.82% gc time) 0.3 has is a bit slower due to worse gc but same memory. Any help would be greatly appreciated!
