Hi all:

I have a question regarding nested @parallel macros.  Let's say I have a 
cluster and I use ssh or Condor to log into the nodes.  Each node has 
multiple cores, of course, so on each node I would like to run a few 
simulations in parallel or run a parallel solver.

I can add workers by using addprocs(["machine1", "machine2"]) or the HTC 
cluster manager.  The outer @parallel macro works and I can evaluate a 
function on each worker.  This part works fine:

@everywhere function locally()
    ## addprocs(2)
    result = @parallel (*) for i in [1:2]
        "[" * readall(`hostname`) * ":" * string(myid()) * "]"
    end
    readall(`hostname`) * " :: " * string(myid()) * " :: " * result
end

@parallel vcat for i in [1:nworkers()]
    locally()
end

julia> include("test.jl")
3-element Array{ASCIIString,1}:
 "mathe1\n :: 2 :: [mathe1\n:2][mathe3\n:3]"
 "mathe3\n :: 3 :: [mathe3\n:3][mathe1\n:2]"
 "mathe5\n :: 4 :: [mathe5\n:4][mathe1\n:2]"

Then I tried to add workers, hopefully locally, by using addprocs() again 
on each worker. But after uncommenting the line addprocs(2), I get an error:

julia> include("test.jl")
exception on 4: ERROR: assertion failed
 in add_workers at multi.jl:243
 in addprocs at multi.jl:1237
 in locally at /home/math/test.jl:4
 in anonymous at no file:12
 in anonymous at multi.jl:1279
 in anonymous at multi.jl:848
 in run_work_thunk at multi.jl:621
 in run_work_thunk at multi.jl:630
 in anonymous at task.jl:6
exception on 3: ERROR: assertion failed
 in add_workers at multi.jl:243
 in addprocs at multi.jl:1237
 in locally at /home/math/test.jl:4
 in anonymous at no file:12
 in anonymous at multi.jl:1279
 in anonymous at multi.jl:848
 in run_work_thunk at multi.jl:621
 in run_work_thunk at multi.jl:630
 in anonymous at task.jl:6
exception on 2: ERROR: assertion failed
 in add_workers at multi.jl:243
 in addprocs at multi.jl:1237
 in locally at /home/math/test.jl:4
 in anonymous at no file:12
 in anonymous at multi.jl:1279
 in anonymous at multi.jl:848
 in run_work_thunk at multi.jl:621
 in run_work_thunk at multi.jl:630
 in anonymous at task.jl:6
3-element Array{ErrorException,1}:
 ErrorException("assertion failed")
 ErrorException("assertion failed")
 ErrorException("assertion failed")


So my questions are:  Has anybody tries this before?  Is it possible out of 
the box?

And a related question is: Let's say each node has eight cores.  How can I 
tell Julia to use at most 8 workers on each node?

Cheers,
Clemens

Reply via email to