Hello all,

I recently acquired an account under a project at ORNL's Titan 
supercomputer, and had hoped to deploy some Julia codes I had written and 
used on my University's HPC cluster but I'm having some trouble. Titan only 
allows one to start processes on other computers via the "aprun" command, 
which is basically the same as mpirun. You can have processes communicate, 
but only via MPI (no sshing into compute nodes allowed).

I know there is an MPI.jl package available and a ClusterManagers.jl 
package available. Does anyone have any idea how much work would be 
involved in trying to create a cluster manager that passes messages between 
workers via MPI rather than ssh? 

Alternatively, I primarily use pmap for parallel computation, so I may be 
able to get by with a wrapper script which will first compute which core 
will do what task in the pmap-like operation and then create a 
configuration file that all the julia workers can see what job they should 
do given their MPI rank from MPI.jl. That might work, but it isn't as clean 
as the real pmap.

Thanks in advance for any guidance y'all might be able to give!
-Josh.

Reply via email to