Hello everyone, 

If I have a very large array in the main process and I use remotecall() or 
pmap() to copy the array to worker processes and modify the array in 
parallel (all modifications are wrapped in a function). After returning 
from the worker process, will the copied array be released?

See the following REPL session run on my laptop (OS X with 8GB memory)

$ ~/julia/julia -p 1

julia> A = randn(10000*10000);
# According to the system monitor, the main process of julia used about 
800MB of memory, and the worker process used about 80MB

julia> A[1] = remotecall_fetch(2, x->(x[1] = 1.0), A);
# Now the main process used about 1.6GB of memory, the worker process used 
about 800MB

julia> @everywhere gc()

# Now Both the main proess and the worker process used about 800MB of 
memory, the copied array in the worker process wasn't released


julia> A[1] = remotecall_fetch(2, x->(x[1] = 2.0), A);
# If I want to iterate the computing, the situation gets worse. Now the 
worker process used about 1.6GB


julia> A[1] = remotecall_fetch(2, x->(x[1] = 3.0), A);
# worker process used about 2.4GB now


julia> A[1] = remotecall_fetch(2, x->(x[1] = 4.0), A);
# worker process used about 3GB


julia> A[1] = remotecall_fetch(2, x->(x[1] = 5.0), A);
# worker process used about 3.8GB

In my real code, the array is even larger and there is more processes. 
After one or two iterations of pmap(), the computation becomes much slower 
than the first iteration. I think it's because the huge memory consumption 
triggers page swapping constantly.  

PS. In fact I prefer using shared memory or multithreading in my project, 
but I don't know how to share a  object with a user defined type besides 
shared array. 

Regards, Yang Zhixuan
 

Reply via email to