Dear all
Quite new in julia so sorry if made something wrong;
Reduced the case to simplest possible;

Using SharedArray within a sequence of functions:

addprocs(4)

function chisq(n::Integer)
A=SharedArray(Float64, n)
@sync @parallel for i in 1:n
A[i]=(rand()-rand())^2
end
sumsq=sum(A)
end

function calculate(n::Integer)
b=0.0
for j in 1:n
b+=chisq(n)
end
return b
end

chisq(500^2)  #ok no failure

calculate(500) # fails


Calculating the same number of evaluations (500 x 500)  it does not fail 
while it crashes before the same function is called 500 times

And the failure is:

> *ERROR: SystemError: shm_open() failed for /jl005889eze42OrPYHS9RKjHZihQ: 
> Too many open files*
>
> * in uv_error at ./libuv.jl:68 [inlined]*
>
> * in _link_pipe(::Ptr{Void}, ::Ptr{Void}) at ./stream.jl:596*
>
> * in link_pipe(::Base.PipeEndpoint, ::Bool, ::Base.PipeEndpoint, ::Bool) 
> at ./stream.jl:652*
>
> * in setup_stdio(::Pipe, ::Bool) at ./process.jl:419*
>
> * in setup_stdio(::Base.##412#413{Cmd,Ptr{Void},Base.Process}, 
> ::Tuple{Base.DevNullStream,Pipe,Base.TTY}) at ./process.jl:464*
>
> * in #spawn#411(::Nullable{Base.ProcessChain}, ::Function, ::Cmd, 
> ::Tuple{Base.DevNullStream,Pipe,Base.TTY}, ::Bool, ::Bool) at 
> ./process.jl:477*
>
> * in (::Base.#kw##spawn)(::Array{Any,1}, ::Base.#spawn, ::Cmd, 
> ::Tuple{Base.DevNullStream,Pipe,Base.TTY}, ::Bool, ::Bool) at ./<missing>:0*
>
> * in open(::Cmd, ::String, ::Base.DevNullStream) at ./process.jl:539*
>
> * in read(::Cmd, ::Base.DevNullStream) at ./process.jl:574*
>
> * in readstring at ./process.jl:581 [inlined] (repeats 2 times)*
>
> * in print_shmem_limits(::Int64) at ./sharedarray.jl:488*
>
> * in shm_mmap_array(::Type{T}, ::Tuple{Int64}, ::String, ::UInt16) at 
> ./sharedarray.jl:515*
>
> * in #SharedArray#786(::Bool, ::Array{Int64,1}, ::Type{T}, 
> ::Type{Float64}, ::Tuple{Int64}) at ./sharedarray.jl:70*
>
> * in SharedArray{T,N}(::Type{Float64}, ::Tuple{Int64}) at 
> ./sharedarray.jl:57*
>
> * in #SharedArray#793(::Array{Any,1}, ::Type{T}, ::Type{T}, ::Int64, 
> ::Vararg{Int64,N}) at ./sharedarray.jl:113*
>
> * in chisq(::Int64) at ./REPL[2]:2*
>
> * in calculate(::Int64) at ./REPL[3]:4*
>

It also happens at  0.4.6, albeit a little different error:

> *ERROR: On worker 3:*
>
> *SystemError: shm_open() failed for /jl006428a6fpOftDBFr087xQnY6F: Too 
> many open files*
>
> * in remotecall_fetch at multi.jl:747*
>
> * in remotecall_fetch at multi.jl:750*
>
> * in call_on_owner at multi.jl:793*
>
> * in wait at multi.jl:808*
>
> * in __SharedArray#138__ at sharedarray.jl:74*
>
> * in SharedArray at sharedarray.jl:117*
>
> * in chisq at none:2*
>
> * in calculate at none:4*
>

In fact, even without the @sync @parallel in the for o function chisq() it 
still crashes; it crashes even without addprocs()

if @everywhere gc() called in the second function (at each function 
calling), it doesn't crash (but long gc() time).

Is garbage collection not recognizing function creating SharedArrays being 
called many times and hitting system's limit of open files?

This might be a common case, for example, when adjusting parameters by 
optimization of a chisquare function - and each simulation being done in 
parallel, whereas optimization method calling chisquare many times...

Or I made something wrong?

Best regards
Rafael

p.s.: could reproduce also in juliabox 0.5.0-dev (below) and 0.4.6, but not 
in a julia 0.4.5 32 bits system:

> In [4]:
>
> calculate(500)
>
> LoadError: On worker 2:
> SystemError: shm_open() failed for /jl000034opVp2HcAjt3ix2bbeW5A: Too many 
> open files
>  in _jl_spawn at ./process.jl:321
>  in #293 at ./process.jl:474 [inlined]
>  in setup_stdio at ./process.jl:462
>  in #spawn#292 at ./process.jl:473
>  in #spawn at ./<missing>:0
>  in ip:0x7f5f467573de at /opt/julia-0.5.0-dev/lib/julia/sys.so:? (repeats 2 
> times)
>  in readstring at ./process.jl:577 [inlined] (repeats 2 times)
>  in print_shmem_limits at ./sharedarray.jl:488
>  in shm_mmap_array at ./sharedarray.jl:515
>  in #657 at ./sharedarray.jl:80
>  in #494 at ./multi.jl:1189
>  in run_work_thunk at ./multi.jl:844
>  in run_work_thunk at ./multi.jl:853 [inlined]
>  in #474 at ./task.jl:54
> while loading In[4], in expression starting on line 1
>
>  in #remotecall_fetch#482(::Array{Any,1}, ::Function, ::Function, 
> ::Base.Worker, ::Base.RRID, ::Vararg{Any,N}) at ./multi.jl:904
>  in remotecall_fetch(::Function, ::Base.Worker, ::Base.RRID, ::Vararg{Any,N}) 
> at ./multi.jl:898
>  in #remotecall_fetch#483(::Array{Any,1}, ::Function, ::Function, ::Int64, 
> ::Base.RRID, ::Vararg{Any,N}) at ./multi.jl:907
>  in remotecall_fetch(::Function, ::Int64, ::Base.RRID, ::Vararg{Any,N}) at 
> ./multi.jl:907
>  in call_on_owner(::Function, ::Future, ::Int64, ::Vararg{Int64,N}) at 
> ./multi.jl:950
>  in wait(::Future) at ./multi.jl:965
>  in #SharedArray#654(::Bool, ::Array{Int64,1}, ::Type{T}, ::Type{Float64}, 
> ::Tuple{Int64}) at ./sharedarray.jl:89
>  in SharedArray{T,N}(::Type{Float64}, ::Tuple{Int64}) at ./sharedarray.jl:57
>  in #SharedArray#661(::Array{Any,1}, ::Type{T}, ::Type{T}, ::Int64, 
> ::Vararg{Int64,N}) at ./sharedarray.jl:113
>  in chisq(::Int64) at ./In[2]:4
>  in calculate(::Int64) at ./In[2]:14
>  in execute_request(::ZMQ.Socket, ::IJulia.Msg) at 
> /opt/julia_packages/.julia/v0.5/IJulia/src/execute_request.jl:164
>  in eventloop(::ZMQ.Socket) at 
> /opt/julia_packages/.julia/v0.5/IJulia/src/IJulia.jl:138
>  in (::IJulia.##25#31)() at ./task.jl:309
>
>
> ERROR (unhandled task failure): EOFError: read end of file
>
>

Reply via email to