I've made a small example of the memory problems I've been running into. I
can't find a way to deallocate a SharedArray, if the code below runs once,
it means the computer has enough memory to run this. If I can properly
deallocate the memory I should be able to do it again, however, I run out
of memory. Am I misunderstanding something about garbage collection in
Julia?
Thanks for your attention
Code:
@everywhere nQ = 60
@everywhere function inF(x::SharedArray,nQ::Int64)
number = myid()-1;
targetLength = nQ*nQ*3
startN = floor((number-1)*targetLength/nworkers()) + 1
endN = floor(number*targetLength/nworkers())
myIndexes = int64(startN:endN)
for j in myIndexes
inds = ind2sub((nQ,nQ,nQ),j)
x[inds[1],inds[2],inds[3],:,:,:] = rand(nQ,nQ,nQ)
end
end
while true
zeroMatrix = SharedArray(Float64,(nQ,nQ,3,nQ,nQ,nQ),pids=workers(), init =
x->inF(x,nQ))
println("ran!")
@everywhere zeroMatrix = 1
@everywhere gc()
end
On Monday, 8 December 2014 23:43:03 UTC-5, Isaiah wrote:
>
> Hopefully you will get an answer on pmap from someone more familiar with
> the parallel stuff, but: have you tried splitting the init step? (see the
> example in the manual for how to init an array in chunks done by different
> workers). Just guessing though: I'm not sure if/how those will be
> serialized if each worker is contending for the whole array.
>
> On Fri, Dec 5, 2014 at 4:23 PM, benFranklin <[email protected]
> <javascript:>> wrote:
>
>> Hi all, I'm trying to figure out how to best initialize a SharedArray,
>> using a C function to fill it up that computes a huge matrix in parts, and
>> all comments are appreciated. To summarise: Is A, making an empty shared
>> array, computing the matrix in parallel using pmap and then filling it up
>> serially, better than using B, computing in parallel and storing in one
>> step by using an init function in the SharedArray declaration?
>>
>>
>> The difference tends to be that B uses a lot more memory, each process
>> using the exact same amount of memory. However it is much faster than A, as
>> the copy step takes longer than the computation, but in A most of the
>> memory usage is in one process, using less memory overall.
>>
>> Any tips on how to do this better? Also, this pmap is how I'm handling
>> more complex paralellizations in Julia. Any comments on that approach?
>>
>> Thanks a lot!
>>
>> Best,
>> Ben
>>
>>
>> CODE A:
>>
>> Is this, making an empty shared array, computing the matrix in parallel
>> and then filling it up serially:
>>
>> function findZeroDividends(model::ModelPrivate)
>>
>> nW = length(model.vW)
>> nZ = length(model.vZ)
>> nK = length(model.vK)
>> nQ = length(model.vQ)
>> zeroMatrix = SharedArray(Float64,(nW,nZ,nK,nQ,nQ,nQ),pids=workers())
>>
>> input = [stateFindZeroK(w,z,k,model) for w in 1:nW, z in 1:nZ, k in
>> 1:nK];
>> results = pmap(findZeroInC,input);
>>
>> for w in 1:nW
>> for z in 1:nZ
>> for k in 1:nK
>>
>> zeroMatrix[w,z,k,:,:,:] = results[w + nW*((z-1) + nZ*(k-1))]
>> end
>> end
>> end
>>
>> return zeroMatrix
>> end
>>
>> _______________________
>>
>> CODE B:
>>
>> Better than these two:
>>
>> function
>> start(x::SharedArray,nW::Int64,nZ::Int64,nK::Int64,model::ModelPrivate)
>>
>> for j in myid()-1:nworkers():(nW*nZ*nK)
>> inds = ind2sub((nW,nZ,nK),j)
>> x[inds[1],inds[2],inds[3],:,:,:]
>> =findZeroInC(stateFindZeroK(inds[1],inds[2],inds[3],model))
>> end
>>
>> x
>>
>> end
>>
>> function findZeroDividendsSmart(model::ModelPrivate)
>>
>> nW = length(model.vW)
>> nZ = length(model.vZ)
>> nK = length(model.vK)
>> nQ = length(model.vQ)
>>
>> #input = [stateFindZeroK(w,z,k,model) for w in 1:nW, z in 1:nZ, k in
>> 1:nK];
>> #results = pmap(findZeroInC,input);
>>
>> zeroMatrix = SharedArray(Float64,(nW,nZ,nK,nQ,nQ,nQ),pids=workers(), init
>> = x->start(x,nW,nZ,nK,model) )
>>
>> return zeroMatrix
>> end
>>
>> ________________________
>>
>> The C function being called is inside this wrapper and returns the
>> pointer to double *capitalChoices = (double
>> *)malloc(sizeof(double)*nQ*nQ*nQ);
>>
>> function findZeroInC(state::stateFindZeroK)
>>
>> w = state.wealth
>> z = state.z
>> k = state.k
>> model = state.model
>>
>> #findZeroInC(double wealth, int z,int k, double theta, double delta,
>> double* vK,
>> # int nK, double* vQ, int nQ, double* transition, double betaGov)
>>
>> nQ = length(model.vQ)
>>
>> t = ccall((:findZeroInC,"findP.so"),
>> Ptr{Float64},(Float64,Int64,Int64,Float64,Float64,Ptr{Float64},Int64,Ptr{Float64},Int64,Ptr{Float64},Float64),
>>
>> model.vW[w],z-1,k-1,model.theta,model.delta,model.vK,length(model.vK),model.vQ,nQ,model.transition,model.betaGov)
>> if t == C_NULL
>> error("NULL")
>> end
>>
>> return pointer_to_array(t,(nQ,nQ,nQ),true)
>>
>> end
>>
>>
>> <https://lh5.googleusercontent.com/-5rJqYh2oUqQ/VIIiFQUl2rI/AAAAAAAAAvM/gwAXG7N0Gxc/s1600/mem.png>
>>
>>
>>
>