I think you are right about some references not being released yet:
If I change the while loop to include you way of replacing every reference,
the put! actually never gets executed, it just waits:
while true
zeroMatrix = SharedArray(Float64,(nQ,nQ,3,nQ,nQ,nQ),pids=workers(),
init = x->inF(x,nQ))
println("ran!")
for i = 1:length(zeroMatrix.refs)
put!(zeroMatrix.refs[i], 1)
end
@everywhere gc()
end
ran!
________
Runs once and stalls, after C-c:
^CERROR: interrupt
in process_events at /usr/bin/../lib64/julia/sys.so
in wait at /usr/bin/../lib64/julia/sys.so (repeats 2 times)
in wait_full at /usr/bin/../lib64/julia/sys.so
____________
After C-d
julia>
WARNING: Forcibly interrupting busy workers
error in running finalizer: InterruptException()
error in running finalizer: InterruptException()
WARNING: Unable to terminate all workers
[...]
It seems after the init function not all workers are "done". I'll see if
there's something weird with that part, but if the SharedArray is being
returned, I don't see any reason for this to be so.
On Wednesday, 10 December 2014 05:19:55 UTC-5, Tim Holy wrote:
>
> After your gc() it should be able to be unmapped, see
>
> https://github.com/JuliaLang/julia/blob/f3c355115ab02868ac644a5561b788fc16738443/base/mmap.jl#L113
>
>
> My guess is something in the parallel architecture is holding a reference.
> Have you tried going at this systematically from the internal
> representation
> of the SharedArray? For example, I might consider trying to put! new stuff
> in
> zeroMatrix.refs:
>
> for i = 1:length(zeroMatrix.refs)
> put!(zeroMatrix.refs[i], 1)
> end
>
> before calling gc(). I don't know if this will work, but it's where I'd
> start
> experimenting.
>
> If you can fix this, please do submit a pull request.
>
> Best,
> --Tim
>
> On Tuesday, December 09, 2014 08:06:10 PM [email protected] <javascript:>
> wrote:
> > On Wednesday, December 10, 2014 12:28:29 PM UTC+10, benFranklin wrote:
> > > I've made a small example of the memory problems I've been running
> into. I
> > > can't find a way to deallocate a SharedArray,
> >
> > Someone more expert might find it, but I can't see anywhere that the
> > mmapped memory is unmapped.
> >
> > > if the code below runs once, it means the computer has enough memory
> to
> > > run this. If I can properly deallocate the memory I should be able to
> do
> > > it
> > > again, however, I run out of memory. Am I misunderstanding something
> about
> > > garbage collection in Julia?
> > >
> > > Thanks for your attention
> > >
> > > Code:
> > >
> > > @everywhere nQ = 60
> > >
> > > @everywhere function inF(x::SharedArray,nQ::Int64)
> > >
> > > number = myid()-1;
> > > targetLength = nQ*nQ*3
> > >
> > > startN = floor((number-1)*targetLength/nworkers()) + 1
> > > endN = floor(number*targetLength/nworkers())
> > >
> > > myIndexes = int64(startN:endN)
> > > for j in myIndexes
> > > inds = ind2sub((nQ,nQ,nQ),j)
> > > x[inds[1],inds[2],inds[3],:,:,:] = rand(nQ,nQ,nQ)
> > > end
> > >
> > >
> > > end
> > >
> > > while true
> > > zeroMatrix = SharedArray(Float64,(nQ,nQ,3,nQ,nQ,nQ),pids=workers(),
> init =
> > > x->inF(x,nQ))
> > > println("ran!")
> > > @everywhere zeroMatrix = 1
> > > @everywhere gc()
> > > end
> > >
> > > On Monday, 8 December 2014 23:43:03 UTC-5, Isaiah wrote:
> > >> Hopefully you will get an answer on pmap from someone more familiar
> with
> > >> the parallel stuff, but: have you tried splitting the init step? (see
> the
> > >> example in the manual for how to init an array in chunks done by
> > >> different
> > >> workers). Just guessing though: I'm not sure if/how those will be
> > >> serialized if each worker is contending for the whole array.
> > >>
> > >> On Fri, Dec 5, 2014 at 4:23 PM, benFranklin <[email protected]>
> wrote:
> > >>> Hi all, I'm trying to figure out how to best initialize a
> SharedArray,
> > >>> using a C function to fill it up that computes a huge matrix in
> parts,
> > >>> and
> > >>> all comments are appreciated. To summarise: Is A, making an empty
> shared
> > >>> array, computing the matrix in parallel using pmap and then filling
> it
> > >>> up
> > >>> serially, better than using B, computing in parallel and storing in
> one
> > >>> step by using an init function in the SharedArray declaration?
> > >>>
> > >>>
> > >>> The difference tends to be that B uses a lot more memory, each
> process
> > >>> using the exact same amount of memory. However it is much faster
> than A,
> > >>> as
> > >>> the copy step takes longer than the computation, but in A most of
> the
> > >>> memory usage is in one process, using less memory overall.
> > >>>
> > >>> Any tips on how to do this better? Also, this pmap is how I'm
> handling
> > >>> more complex paralellizations in Julia. Any comments on that
> approach?
> > >>>
> > >>> Thanks a lot!
> > >>>
> > >>> Best,
> > >>> Ben
> > >>>
> > >>>
> > >>> CODE A:
> > >>>
> > >>> Is this, making an empty shared array, computing the matrix in
> parallel
> > >>> and then filling it up serially:
> > >>>
> > >>> function findZeroDividends(model::ModelPrivate)
> > >>>
> > >>> nW = length(model.vW)
> > >>> nZ = length(model.vZ)
> > >>> nK = length(model.vK)
> > >>> nQ = length(model.vQ)
> > >>>
> > >>> zeroMatrix =
> SharedArray(Float64,(nW,nZ,nK,nQ,nQ,nQ),pids=workers())
> > >>>
> > >>> input = [stateFindZeroK(w,z,k,model) for w in 1:nW, z in 1:nZ, k in
> > >>> 1:nK];
> > >>> results = pmap(findZeroInC,input);
> > >>>
> > >>> for w in 1:nW
> > >>> for z in 1:nZ
> > >>> for k in 1:nK
> > >>>
> > >>> zeroMatrix[w,z,k,:,:,:] = results[w + nW*((z-1) + nZ*(k-1))]
> > >>>
> > >>> end
> > >>>
> > >>> end
> > >>> end
> > >>>
> > >>> return zeroMatrix
> > >>> end
> > >>>
> > >>> _______________________
> > >>>
> > >>> CODE B:
> > >>>
> > >>> Better than these two:
> > >>>
> > >>> function
> > >>>
> start(x::SharedArray,nW::Int64,nZ::Int64,nK::Int64,model::ModelPrivate)
> > >>>
> > >>> for j in myid()-1:nworkers():(nW*nZ*nK)
> > >>> inds = ind2sub((nW,nZ,nK),j)
> > >>> x[inds[1],inds[2],inds[3],:,:,:]
> > >>> =findZeroInC(stateFindZeroK(inds[1],inds[2],inds[3],model))
> > >>> end
> > >>>
> > >>> x
> > >>>
> > >>> end
> > >>>
> > >>> function findZeroDividendsSmart(model::ModelPrivate)
> > >>>
> > >>> nW = length(model.vW)
> > >>> nZ = length(model.vZ)
> > >>> nK = length(model.vK)
> > >>> nQ = length(model.vQ)
> > >>>
> > >>> #input = [stateFindZeroK(w,z,k,model) for w in 1:nW, z in 1:nZ, k
> in
> > >>> 1:nK];
> > >>> #results = pmap(findZeroInC,input);
> > >>>
> > >>> zeroMatrix = SharedArray(Float64,(nW,nZ,nK,nQ,nQ,nQ),pids=workers(),
> > >>> init = x->start(x,nW,nZ,nK,model) )
> > >>>
> > >>> return zeroMatrix
> > >>> end
> > >>>
> > >>> ________________________
> > >>>
> > >>> The C function being called is inside this wrapper and returns the
> > >>> pointer to double *capitalChoices = (double
> > >>> *)malloc(sizeof(double)*nQ*nQ*nQ);
> > >>>
> > >>> function findZeroInC(state::stateFindZeroK)
> > >>>
> > >>> w = state.wealth
> > >>> z = state.z
> > >>> k = state.k
> > >>> model = state.model
> > >>>
> > >>> #findZeroInC(double wealth, int z,int k, double theta, double
> delta,
> > >>>
> > >>> double* vK,
> > >>>
> > >>> # int nK, double* vQ, int nQ, double* transition, double betaGov)
> > >>>
> > >>> nQ = length(model.vQ)
> > >>>
> > >>> t = ccall((:findZeroInC,"findP.so"),
> > >>>
> Ptr{Float64},(Float64,Int64,Int64,Float64,Float64,Ptr{Float64},Int64,Ptr
> > >>> {Float64},Int64,Ptr{Float64},Float64),
> > >>>
> > >>>
> model.vW[w],z-1,k-1,model.theta,model.delta,model.vK,length(model.vK),mo
> > >>> del.vQ,nQ,model.transition,model.betaGov) if t == C_NULL
> > >>> error("NULL")
> > >>> end
> > >>>
> > >>> return pointer_to_array(t,(nQ,nQ,nQ),true)
> > >>>
> > >>> end
> > >>>
> > >>>
> > >>> <
> https://lh5.googleusercontent.com/-5rJqYh2oUqQ/VIIiFQUl2rI/AAAAAAAAAvM/
> > >>> gwAXG7N0Gxc/s1600/mem.png>
>
>