Hello all, I recently acquired an account under a project at ORNL's Titan supercomputer, and had hoped to deploy some Julia codes I had written and used on my University's HPC cluster but I'm having some trouble. Titan only allows one to start processes on other computers via the "aprun" command, which is basically the same as mpirun. You can have processes communicate, but only via MPI (no sshing into compute nodes allowed).
I know there is an MPI.jl package available and a ClusterManagers.jl package available. Does anyone have any idea how much work would be involved in trying to create a cluster manager that passes messages between workers via MPI rather than ssh? Alternatively, I primarily use pmap for parallel computation, so I may be able to get by with a wrapper script which will first compute which core will do what task in the pmap-like operation and then create a configuration file that all the julia workers can see what job they should do given their MPI rank from MPI.jl. That might work, but it isn't as clean as the real pmap. Thanks in advance for any guidance y'all might be able to give! -Josh.
