Karonis Nicholas wrote:
Jan,
I don't know of a way to do what you're suggesting
(i.e., prologue - run MPI job - epilogue) using
Globus alone. There may be ways to do this with
higher-level "workflow" tools built atop Globus.
One simple solution that came to mind that might
work is to write a script that does the prologue,
then calls "mpirun" to run the MPI job, then does
the epilogue.
Sorry that I couldn't be of more help.
Nick,
Thanks for the reply. Your suggested solution won't work because
$PBS_NODEFILE is not defined for Globus jobs (the dreaded use of rsh/ssh
even for 'single' type jobs gets into way). However you'd neeed the
parameter -machinefile $PBS_NODEFILE for mpirun. Even if it worked, it
would not be nice because of the need to hardcode site-specific mpirun
syntax into job scripts.
I've been examining different options since my original posting. The
general idea for the one I like most (so far) is sketched here:
https://bi.offis.de/wisent/tiki-index.php?page=Condor-GT4-BigJobs
The trick is to submit an 'mpi' type job, but to use a special Perl
script instead of the real MPI executable. It works fine, except in
environments where mpirun force-kills the processes after MPI_Finalize
(in such cases the cleanup phase won't run).
I guess I'm going to submit an enhancement request for a feature that
would allow invoking a user-specified executable before and after mpirun
in the Globus-generated PBS script. The same option would be also
beneficial for jobs of type 'multiple'. Using "workflow" tools to
achieve the desired effect sounds like an overkill to me. From a user's
perspective, the setup-MPI-cleanup thing is really a single, atomic job,
not three jobs that need to be coordinated.
Best regards,
Jan Ploski