It seems to me that your code is correct BUT:
allocating a SharedArray is a bit expensive, and should be done once.
The follwowing modifications runs OK
function chisq(A::SharedArray{Float64})
n=length(A)
@sync @parallel for i in 1:n
A[i]=(rand()-rand())^2
end
sumsq=sum(A)
end
function calculate(n::Integer)
A=SharedArray(Float64, n)
b=0.0
for j in 1:n
b+=chisq(A)
end
return b
end
#chisq(500^2) #ok no failure
calculate(500) # fails
Le mercredi 31 août 2016 07:11:41 UTC+2, Rafael Menegassi a écrit :
>
> Dear all
> Quite new in julia so sorry if made something wrong;
> Reduced the case to simplest possible;
>
> Using SharedArray within a sequence of functions:
>
> addprocs(4)
>
> function chisq(n::Integer)
> A=SharedArray(Float64, n)
> @sync @parallel for i in 1:n
> A[i]=(rand()-rand())^2
> end
> sumsq=sum(A)
> end
>
> function calculate(n::Integer)
> b=0.0
> for j in 1:n
> b+=chisq(n)
> end
> return b
> end
>
> chisq(500^2) #ok no failure
>
> calculate(500) # fails
>
>
> Calculating the same number of evaluations (500 x 500) it does not fail
> while it crashes before the same function is called 500 times
>
> And the failure is:
>
>> *ERROR: SystemError: shm_open() failed for /jl005889eze42OrPYHS9RKjHZihQ:
>> Too many open files*
>>
>> * in uv_error at ./libuv.jl:68 [inlined]*
>>
>> * in _link_pipe(::Ptr{Void}, ::Ptr{Void}) at ./stream.jl:596*
>>
>> * in link_pipe(::Base.PipeEndpoint, ::Bool, ::Base.PipeEndpoint, ::Bool)
>> at ./stream.jl:652*
>>
>> * in setup_stdio(::Pipe, ::Bool) at ./process.jl:419*
>>
>> * in setup_stdio(::Base.##412#413{Cmd,Ptr{Void},Base.Process},
>> ::Tuple{Base.DevNullStream,Pipe,Base.TTY}) at ./process.jl:464*
>>
>> * in #spawn#411(::Nullable{Base.ProcessChain}, ::Function, ::Cmd,
>> ::Tuple{Base.DevNullStream,Pipe,Base.TTY}, ::Bool, ::Bool) at
>> ./process.jl:477*
>>
>> * in (::Base.#kw##spawn)(::Array{Any,1}, ::Base.#spawn, ::Cmd,
>> ::Tuple{Base.DevNullStream,Pipe,Base.TTY}, ::Bool, ::Bool) at ./<missing>:0*
>>
>> * in open(::Cmd, ::String, ::Base.DevNullStream) at ./process.jl:539*
>>
>> * in read(::Cmd, ::Base.DevNullStream) at ./process.jl:574*
>>
>> * in readstring at ./process.jl:581 [inlined] (repeats 2 times)*
>>
>> * in print_shmem_limits(::Int64) at ./sharedarray.jl:488*
>>
>> * in shm_mmap_array(::Type{T}, ::Tuple{Int64}, ::String, ::UInt16) at
>> ./sharedarray.jl:515*
>>
>> * in #SharedArray#786(::Bool, ::Array{Int64,1}, ::Type{T},
>> ::Type{Float64}, ::Tuple{Int64}) at ./sharedarray.jl:70*
>>
>> * in SharedArray{T,N}(::Type{Float64}, ::Tuple{Int64}) at
>> ./sharedarray.jl:57*
>>
>> * in #SharedArray#793(::Array{Any,1}, ::Type{T}, ::Type{T}, ::Int64,
>> ::Vararg{Int64,N}) at ./sharedarray.jl:113*
>>
>> * in chisq(::Int64) at ./REPL[2]:2*
>>
>> * in calculate(::Int64) at ./REPL[3]:4*
>>
>
> It also happens at 0.4.6, albeit a little different error:
>
>> *ERROR: On worker 3:*
>>
>> *SystemError: shm_open() failed for /jl006428a6fpOftDBFr087xQnY6F: Too
>> many open files*
>>
>> * in remotecall_fetch at multi.jl:747*
>>
>> * in remotecall_fetch at multi.jl:750*
>>
>> * in call_on_owner at multi.jl:793*
>>
>> * in wait at multi.jl:808*
>>
>> * in __SharedArray#138__ at sharedarray.jl:74*
>>
>> * in SharedArray at sharedarray.jl:117*
>>
>> * in chisq at none:2*
>>
>> * in calculate at none:4*
>>
>
> In fact, even without the @sync @parallel in the for o function chisq()
> it still crashes; it crashes even without addprocs()
>
> if @everywhere gc() called in the second function (at each function
> calling), it doesn't crash (but long gc() time).
>
> Is garbage collection not recognizing function creating SharedArrays being
> called many times and hitting system's limit of open files?
>
> This might be a common case, for example, when adjusting parameters by
> optimization of a chisquare function - and each simulation being done in
> parallel, whereas optimization method calling chisquare many times...
>
> Or I made something wrong?
>
> Best regards
> Rafael
>
> p.s.: could reproduce also in juliabox 0.5.0-dev (below) and 0.4.6, but
> not in a julia 0.4.5 32 bits system:
>
>> In [4]:
>>
>> calculate(500)
>>
>> LoadError: On worker 2:
>> SystemError: shm_open() failed for /jl000034opVp2HcAjt3ix2bbeW5A: Too many
>> open files
>> in _jl_spawn at ./process.jl:321
>> in #293 at ./process.jl:474 [inlined]
>> in setup_stdio at ./process.jl:462
>> in #spawn#292 at ./process.jl:473
>> in #spawn at ./<missing>:0
>> in ip:0x7f5f467573de at /opt/julia-0.5.0-dev/lib/julia/sys.so:? (repeats 2
>> times)
>> in readstring at ./process.jl:577 [inlined] (repeats 2 times)
>> in print_shmem_limits at ./sharedarray.jl:488
>> in shm_mmap_array at ./sharedarray.jl:515
>> in #657 at ./sharedarray.jl:80
>> in #494 at ./multi.jl:1189
>> in run_work_thunk at ./multi.jl:844
>> in run_work_thunk at ./multi.jl:853 [inlined]
>> in #474 at ./task.jl:54
>> while loading In[4], in expression starting on line 1
>>
>> in #remotecall_fetch#482(::Array{Any,1}, ::Function, ::Function,
>> ::Base.Worker, ::Base.RRID, ::Vararg{Any,N}) at ./multi.jl:904
>> in remotecall_fetch(::Function, ::Base.Worker, ::Base.RRID,
>> ::Vararg{Any,N}) at ./multi.jl:898
>> in #remotecall_fetch#483(::Array{Any,1}, ::Function, ::Function, ::Int64,
>> ::Base.RRID, ::Vararg{Any,N}) at ./multi.jl:907
>> in remotecall_fetch(::Function, ::Int64, ::Base.RRID, ::Vararg{Any,N}) at
>> ./multi.jl:907
>> in call_on_owner(::Function, ::Future, ::Int64, ::Vararg{Int64,N}) at
>> ./multi.jl:950
>> in wait(::Future) at ./multi.jl:965
>> in #SharedArray#654(::Bool, ::Array{Int64,1}, ::Type{T}, ::Type{Float64},
>> ::Tuple{Int64}) at ./sharedarray.jl:89
>> in SharedArray{T,N}(::Type{Float64}, ::Tuple{Int64}) at ./sharedarray.jl:57
>> in #SharedArray#661(::Array{Any,1}, ::Type{T}, ::Type{T}, ::Int64,
>> ::Vararg{Int64,N}) at ./sharedarray.jl:113
>> in chisq(::Int64) at ./In[2]:4
>> in calculate(::Int64) at ./In[2]:14
>> in execute_request(::ZMQ.Socket, ::IJulia.Msg) at
>> /opt/julia_packages/.julia/v0.5/IJulia/src/execute_request.jl:164
>> in eventloop(::ZMQ.Socket) at
>> /opt/julia_packages/.julia/v0.5/IJulia/src/IJulia.jl:138
>> in (::IJulia.##25#31)() at ./task.jl:309
>>
>>
>> ERROR (unhandled task failure): EOFError: read end of file
>>
>>