On this Julia version:

   _       _ _(_)_     |  A fresh approach to technical computing
  (_)     | (_) (_)    |  Documentation: http://docs.julialang.org
   _ _   _| |_  __ _   |  Type "?help" for help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 0.5.0 (2016-09-19 18:14 UTC)
 _/ |\__'_|_|_|\__'_|  |  Official http://julialang.org/ release
|__/                   |  x86_64-pc-linux-gnu

running on:

Ubuntu 14.04.3 LTS

I am trying to do a Monte Carlo simulation in parallel across 36 workers.  

I have two problems (at least).

1.  Some of the workers terminate at the beginning of the simulation, but I 
don't understand the error message:

Worker 5 terminated.ERROR (unhandled task failure): ProcessExitedException()
 in yieldto(::Task, ::ANY) at ./event.jl:136
 in wait() at ./event.jl:169
 in wait(::Condition) at ./event.jl:27
 in wait(::Channel{Any}) at ./channels.jl:92
 in take!(::Channel{Any}) at ./channels.jl:73
 in #remotecall_fetch#606(::Array{Any,1}, ::Function, ::Function, 
::Base.Worker, ::Function, ::Vararg{Any,N}) at ./multi.jl:1066
 in remotecall_fetch(::Function, ::Base.Worker, ::Function, 
::Vararg{Any,N}) at ./multi.jl:1062
 in #remotecall_fetch#609(::Array{Any,1}, ::Function, ::Function, ::Int64, 
::Function, ::Vararg{Any,N}) at ./multi.jl:1080
 in remotecall_fetch(::Function, ::Int64, ::Function, ::Vararg{Any,N}) at 
./multi.jl:1080
 in 
(::Base.##667#668{Base.#+,ProjectModule.##45#47{Int64,Array{Any,1},Array{Any,2}},UnitRange{Int64},Array{UnitRange{Int64},1}})()
 
at ./multi.jl:1998

This is not a huge problem as the rest of the workers keep going and can 
finish the simulation, but I would like to understand what is going on, if 
possible.  (And maybe how to fix it so as to use those workers.)  

2.  The more important problem is that at the end of the simulation, I run 
into other errors and nothing is returned.  My (uninformed and probably 
wrong) guess is that there is something the program doesn't like about the 
fact that the different workers are finishing at different times?  The 
errors I get are:


ERROR (unhandled task failure): EOFError: read end of file
Worker 16 terminated.ERROR (unhandled task failure): 
ProcessExitedException()
 in yieldto(::Task, ::ANY) at ./event.jl:136
 in wait() at ./event.jl:169
 in wait(::Condition) at ./event.jl:27
 in wait(::Channel{Any}) at ./channels.jl:92
 in take!(::Channel{Any}) at ./channels.jl:73
 in #remotecall_fetch#606(::Array{Any,1}, ::Function, ::Function, 
::Base.Worker, ::Function, ::Vararg{Any,N}) at ./multi.jl:1066
 in remotecall_fetch(::Function, ::Base.Worker, ::Function, 
::Vararg{Any,N}) at ./multi.jl:1062
 in #remotecall_fetch#609(::Array{Any,1}, ::Function, ::Function, ::Int64, 
::Function, ::Vararg{Any,N}) at ./multi.jl:1080
 in remotecall_fetch(::Function, ::Int64, ::Function, ::Vararg{Any,N}) at 
./multi.jl:1080
 in 
(::Base.##667#668{Base.#+,ProjectModule.##45#47{Int64,Array{Any,1},Array{Any,2}},UnitRange{Int64},Array{UnitRange{Int64},1}})()
 
at ./multi.jl:1998

And - 

ERROR: LoadError: ProcessExitedException()
 in wait(::Task) at ./task.jl:135
 in collect_to!(::Array{Array{Float64,2},1}, 
::Base.Generator{Array{Task,1},Base.#wait}, ::Int64, ::Int64) at 
./array.jl:340
 in collect(::Base.Generator{Array{Task,1},Base.#wait}) at ./array.jl:308
 in preduce(::Function, ::Function, ::UnitRange{Int64}) at ./multi.jl:2002
 in (::ProjectModule.##44#46{Int64,Array{Any,1},Array{Any,2},Int64})() at 
./multi.jl:2011
 in macro expansion at ./task.jl:326 [inlined]
 in #OuterSim#43(::Int64, ::Int64, ::Int64, ::Array{Any,1}, ::Array{Any,2}, 
::Function, ::Int64) at /home/ubuntu/dynhosp/DataStructs.jl:1321
 in (::ProjectModule.#kw##OuterSim)(::Array{Any,1}, 
::ProjectModule.#OuterSim, ::Int64) at ./<missing>:0
 in include_from_node1(::String) at ./loading.jl:488
 in process_options(::Base.JLOptions) at ./client.jl:262
 in _start() at ./client.jl:318
while loading /home/ubuntu/dynhosp/Run.jl, in expression starting on line 9

And finally: 

ERROR (unhandled task failure): On worker 9:
ArgumentError: Dict(kv): kv needs to be an iterator of tuples or pairs
 in Type at ./dict.jl:388
 in CalcWTP at /home/ubuntu/dynhosp/DataStructs.jl:728
 in WTPMap at /home/ubuntu/dynhosp/DataStructs.jl:747
 in #PSim#32 at /home/ubuntu/dynhosp/DataStructs.jl:1024
 in #45 at ./multi.jl:2016
 in #625 at ./multi.jl:1421
 in run_work_thunk at ./multi.jl:1001
 in macro expansion at ./multi.jl:1421 [inlined]
 in #624 at ./event.jl:68
 in #remotecall_fetch#606(::Array{Any,1}, ::Function, ::Function, 
::Base.Worker, ::Function, ::Vararg{Any,N}) at ./multi.jl:1070
 in remotecall_fetch(::Function, ::Base.Worker, ::Function, 
::Vararg{Any,N}) at ./multi.jl:1062
 in #remotecall_fetch#609(::Array{Any,1}, ::Function, ::Function, ::Int64, 
::Function, ::Vararg{Any,N}) at ./multi.jl:1080
 in remotecall_fetch(::Function, ::Int64, ::Function, ::Vararg{Any,N}) at 
./multi.jl:1080
 in 
(::Base.##667#668{Base.#+,ProjectModule.##45#47{Int64,Array{Any,1},Array{Any,2}},UnitRange{Int64},Array{UnitRange{Int64},1}})()
 
at ./multi.jl:1998

The actual function I am calling is:

function OuterSim(MCcount::Int; T1::Int64 = 3, dim1::Int64 = 290, dim2::
Int64 = 67, fi = fips, da = data05)
  outp = @sync @parallel (+) for j = 1:MCcount
    Texas = MakeNew(fi, da);                                               
                            
    eq_patients = NewPatients()                                             
                            
    neq_patients = NewPatients()                                           
                             
    ResultsOut(NewSim(T1, Texas, eq_patients), PSim(T1, neq_patients); T = 
T1)                          
  end
  outp[:,1] = outp[:,1]/MCcount                                             
                           
  return outp
end

I added the "@sync" following the suggestion of a colleague here - I am not 
sure it's necessary.  (FWIW - I get the errors above on Ubuntu whether I 
include it or not.)

This code *does* run and terminate without error on my own home machine 
(running OS-X, also v0.5), which has only four cores.

I would love your feedback!

Thanks - 

AB

Reply via email to