Hi all:
I have a question regarding nested @parallel macros. Let's say I have a
cluster and I use ssh or Condor to log into the nodes. Each node has
multiple cores, of course, so on each node I would like to run a few
simulations in parallel or run a parallel solver.
I can add workers by using addprocs(["machine1", "machine2"]) or the HTC
cluster manager. The outer @parallel macro works and I can evaluate a
function on each worker. This part works fine:
@everywhere function locally()
## addprocs(2)
result = @parallel (*) for i in [1:2]
"[" * readall(`hostname`) * ":" * string(myid()) * "]"
end
readall(`hostname`) * " :: " * string(myid()) * " :: " * result
end
@parallel vcat for i in [1:nworkers()]
locally()
end
julia> include("test.jl")
3-element Array{ASCIIString,1}:
"mathe1\n :: 2 :: [mathe1\n:2][mathe3\n:3]"
"mathe3\n :: 3 :: [mathe3\n:3][mathe1\n:2]"
"mathe5\n :: 4 :: [mathe5\n:4][mathe1\n:2]"
Then I tried to add workers, hopefully locally, by using addprocs() again
on each worker. But after uncommenting the line addprocs(2), I get an error:
julia> include("test.jl")
exception on 4: ERROR: assertion failed
in add_workers at multi.jl:243
in addprocs at multi.jl:1237
in locally at /home/math/test.jl:4
in anonymous at no file:12
in anonymous at multi.jl:1279
in anonymous at multi.jl:848
in run_work_thunk at multi.jl:621
in run_work_thunk at multi.jl:630
in anonymous at task.jl:6
exception on 3: ERROR: assertion failed
in add_workers at multi.jl:243
in addprocs at multi.jl:1237
in locally at /home/math/test.jl:4
in anonymous at no file:12
in anonymous at multi.jl:1279
in anonymous at multi.jl:848
in run_work_thunk at multi.jl:621
in run_work_thunk at multi.jl:630
in anonymous at task.jl:6
exception on 2: ERROR: assertion failed
in add_workers at multi.jl:243
in addprocs at multi.jl:1237
in locally at /home/math/test.jl:4
in anonymous at no file:12
in anonymous at multi.jl:1279
in anonymous at multi.jl:848
in run_work_thunk at multi.jl:621
in run_work_thunk at multi.jl:630
in anonymous at task.jl:6
3-element Array{ErrorException,1}:
ErrorException("assertion failed")
ErrorException("assertion failed")
ErrorException("assertion failed")
So my questions are: Has anybody tries this before? Is it possible out of
the box?
And a related question is: Let's say each node has eight cores. How can I
tell Julia to use at most 8 workers on each node?
Cheers,
Clemens