It seems to me that your code is correct BUT:

allocating a SharedArray is a bit expensive, and should be done once.

The follwowing modifications runs OK

function chisq(A::SharedArray{Float64})
    n=length(A)
    @sync @parallel for i in 1:n
    A[i]=(rand()-rand())^2
    end
    sumsq=sum(A)
end





function calculate(n::Integer)
    A=SharedArray(Float64, n)
    b=0.0
    for j in 1:n
        b+=chisq(A)
    end
    return b
end

#chisq(500^2)  #ok no failure

calculate(500) # fails


Le mercredi 31 août 2016 07:11:41 UTC+2, Rafael Menegassi a écrit :
>
> Dear all
> Quite new in julia so sorry if made something wrong;
> Reduced the case to simplest possible;
>
> Using SharedArray within a sequence of functions:
>
> addprocs(4)
>
> function chisq(n::Integer)
> A=SharedArray(Float64, n)
> @sync @parallel for i in 1:n
> A[i]=(rand()-rand())^2
> end
> sumsq=sum(A)
> end
>
> function calculate(n::Integer)
> b=0.0
> for j in 1:n
> b+=chisq(n)
> end
> return b
> end
>
> chisq(500^2)  #ok no failure
>
> calculate(500) # fails
>
>
> Calculating the same number of evaluations (500 x 500)  it does not fail 
> while it crashes before the same function is called 500 times
>
> And the failure is:
>
>> *ERROR: SystemError: shm_open() failed for /jl005889eze42OrPYHS9RKjHZihQ: 
>> Too many open files*
>>
>> * in uv_error at ./libuv.jl:68 [inlined]*
>>
>> * in _link_pipe(::Ptr{Void}, ::Ptr{Void}) at ./stream.jl:596*
>>
>> * in link_pipe(::Base.PipeEndpoint, ::Bool, ::Base.PipeEndpoint, ::Bool) 
>> at ./stream.jl:652*
>>
>> * in setup_stdio(::Pipe, ::Bool) at ./process.jl:419*
>>
>> * in setup_stdio(::Base.##412#413{Cmd,Ptr{Void},Base.Process}, 
>> ::Tuple{Base.DevNullStream,Pipe,Base.TTY}) at ./process.jl:464*
>>
>> * in #spawn#411(::Nullable{Base.ProcessChain}, ::Function, ::Cmd, 
>> ::Tuple{Base.DevNullStream,Pipe,Base.TTY}, ::Bool, ::Bool) at 
>> ./process.jl:477*
>>
>> * in (::Base.#kw##spawn)(::Array{Any,1}, ::Base.#spawn, ::Cmd, 
>> ::Tuple{Base.DevNullStream,Pipe,Base.TTY}, ::Bool, ::Bool) at ./<missing>:0*
>>
>> * in open(::Cmd, ::String, ::Base.DevNullStream) at ./process.jl:539*
>>
>> * in read(::Cmd, ::Base.DevNullStream) at ./process.jl:574*
>>
>> * in readstring at ./process.jl:581 [inlined] (repeats 2 times)*
>>
>> * in print_shmem_limits(::Int64) at ./sharedarray.jl:488*
>>
>> * in shm_mmap_array(::Type{T}, ::Tuple{Int64}, ::String, ::UInt16) at 
>> ./sharedarray.jl:515*
>>
>> * in #SharedArray#786(::Bool, ::Array{Int64,1}, ::Type{T}, 
>> ::Type{Float64}, ::Tuple{Int64}) at ./sharedarray.jl:70*
>>
>> * in SharedArray{T,N}(::Type{Float64}, ::Tuple{Int64}) at 
>> ./sharedarray.jl:57*
>>
>> * in #SharedArray#793(::Array{Any,1}, ::Type{T}, ::Type{T}, ::Int64, 
>> ::Vararg{Int64,N}) at ./sharedarray.jl:113*
>>
>> * in chisq(::Int64) at ./REPL[2]:2*
>>
>> * in calculate(::Int64) at ./REPL[3]:4*
>>
>
> It also happens at  0.4.6, albeit a little different error:
>
>> *ERROR: On worker 3:*
>>
>> *SystemError: shm_open() failed for /jl006428a6fpOftDBFr087xQnY6F: Too 
>> many open files*
>>
>> * in remotecall_fetch at multi.jl:747*
>>
>> * in remotecall_fetch at multi.jl:750*
>>
>> * in call_on_owner at multi.jl:793*
>>
>> * in wait at multi.jl:808*
>>
>> * in __SharedArray#138__ at sharedarray.jl:74*
>>
>> * in SharedArray at sharedarray.jl:117*
>>
>> * in chisq at none:2*
>>
>> * in calculate at none:4*
>>
>
> In fact, even without the @sync @parallel in the for o function chisq() 
> it still crashes; it crashes even without addprocs()
>
> if @everywhere gc() called in the second function (at each function 
> calling), it doesn't crash (but long gc() time).
>
> Is garbage collection not recognizing function creating SharedArrays being 
> called many times and hitting system's limit of open files?
>
> This might be a common case, for example, when adjusting parameters by 
> optimization of a chisquare function - and each simulation being done in 
> parallel, whereas optimization method calling chisquare many times...
>
> Or I made something wrong?
>
> Best regards
> Rafael
>
> p.s.: could reproduce also in juliabox 0.5.0-dev (below) and 0.4.6, but 
> not in a julia 0.4.5 32 bits system:
>
>> In [4]:
>>
>> calculate(500)
>>
>> LoadError: On worker 2:
>> SystemError: shm_open() failed for /jl000034opVp2HcAjt3ix2bbeW5A: Too many 
>> open files
>>  in _jl_spawn at ./process.jl:321
>>  in #293 at ./process.jl:474 [inlined]
>>  in setup_stdio at ./process.jl:462
>>  in #spawn#292 at ./process.jl:473
>>  in #spawn at ./<missing>:0
>>  in ip:0x7f5f467573de at /opt/julia-0.5.0-dev/lib/julia/sys.so:? (repeats 2 
>> times)
>>  in readstring at ./process.jl:577 [inlined] (repeats 2 times)
>>  in print_shmem_limits at ./sharedarray.jl:488
>>  in shm_mmap_array at ./sharedarray.jl:515
>>  in #657 at ./sharedarray.jl:80
>>  in #494 at ./multi.jl:1189
>>  in run_work_thunk at ./multi.jl:844
>>  in run_work_thunk at ./multi.jl:853 [inlined]
>>  in #474 at ./task.jl:54
>> while loading In[4], in expression starting on line 1
>>
>>  in #remotecall_fetch#482(::Array{Any,1}, ::Function, ::Function, 
>> ::Base.Worker, ::Base.RRID, ::Vararg{Any,N}) at ./multi.jl:904
>>  in remotecall_fetch(::Function, ::Base.Worker, ::Base.RRID, 
>> ::Vararg{Any,N}) at ./multi.jl:898
>>  in #remotecall_fetch#483(::Array{Any,1}, ::Function, ::Function, ::Int64, 
>> ::Base.RRID, ::Vararg{Any,N}) at ./multi.jl:907
>>  in remotecall_fetch(::Function, ::Int64, ::Base.RRID, ::Vararg{Any,N}) at 
>> ./multi.jl:907
>>  in call_on_owner(::Function, ::Future, ::Int64, ::Vararg{Int64,N}) at 
>> ./multi.jl:950
>>  in wait(::Future) at ./multi.jl:965
>>  in #SharedArray#654(::Bool, ::Array{Int64,1}, ::Type{T}, ::Type{Float64}, 
>> ::Tuple{Int64}) at ./sharedarray.jl:89
>>  in SharedArray{T,N}(::Type{Float64}, ::Tuple{Int64}) at ./sharedarray.jl:57
>>  in #SharedArray#661(::Array{Any,1}, ::Type{T}, ::Type{T}, ::Int64, 
>> ::Vararg{Int64,N}) at ./sharedarray.jl:113
>>  in chisq(::Int64) at ./In[2]:4
>>  in calculate(::Int64) at ./In[2]:14
>>  in execute_request(::ZMQ.Socket, ::IJulia.Msg) at 
>> /opt/julia_packages/.julia/v0.5/IJulia/src/execute_request.jl:164
>>  in eventloop(::ZMQ.Socket) at 
>> /opt/julia_packages/.julia/v0.5/IJulia/src/IJulia.jl:138
>>  in (::IJulia.##25#31)() at ./task.jl:309
>>
>>
>> ERROR (unhandled task failure): EOFError: read end of file
>>
>>

Reply via email to