Hi,

I'm trying to launch a parallel julia session with 4 processes (workers): 2 
in host01 and 2 in host02 (launching from host01) but it's not working and 
it's kind of weird.

I can start two workers, one in host01 and one in host02, with the 
following machinefile.txt

host02
host01

and typing:

$ julia --machinefile machinefile.txt

Then, for example:

julia> @everywhere run(`hostname`)
host01
        From worker 2:  host01
        From worker 3:  host02

I can also add 2 workers in host01, wher machinefile.txt now is:

host01
host02
host01
host01


and typing:

julia> @everywhere run(`hostname`)
host01
        From worker 3:  host02
        From worker 2:  host01
        From worker 5:  host01
        From worker 4:  host01


However, if I want to start 2 workers on each machine, with the following 
machinefile.txt:

host01
host01
host02
host02

Then it hangs, and after a while I get an error:

Master process (id 1) could not connect within 60.0 seconds.
exiting.
Master process (id 1) could not connect within 60.0 seconds.
exiting.
Master process (id 1) could not connect within 60.0 seconds.
exiting.
Master process (id 1) could not connect within 60.0 seconds.
exiting.
ERROR: connect: connection timed out (ETIMEDOUT)
 in wait at ./task.jl:284
 in wait at ./task.jl:194
 in stream_wait at stream.jl:263
 in wait_connected at stream.jl:301
 in Worker at multi.jl:113
 in create_worker at multi.jl:1064
 in start_cluster_workers at multi.jl:1028
 in addprocs_internal at multi.jl:1234
 in addprocs at multi.jl:1244
 in process_options at ./client.jl:240
 in _start at ./client.jl:354
 in _start_3B_1716 at /usr/bin/../lib/x86_64-linux-gnu/julia/sys.so

Is it my network/system ???

Thanks.

Reply via email to