> We should probably enhance the mpi_cmd_for function such that it
> check
> whether mpirun is available, as falls back to different commands if
> not,
> including srun.
> It's not terribly difficult to do this I think, if you're familiar
> with
> Python at least: rather than hardcoding "mpirun" when defining the
> "mpi_cmds" dictionary, it should
> insert whichever command is available first (I think?).
>
That sounds about right. Or make it a default "Hash" (or whatever Python term
is for key value pair data structure) and the top-level keys would be based on
a config option of "scheduler" and then the default would be "default" which is
current behavior? I can imagine defaults being able to work 99% of the time
but having the ability to override them all would be nice. For example,
MVAPICH2 built with --with-pm=no requires use of "srun" but OpenMPI still
builds mpirun even when built against the SLURM PMI library. Also a way to
customize the "mpirun" (or srun) command would be ideal (see issue below
regarding wrong partition being used). For now can tweak behavior using
environment variables.
> The mpi_cmd_for function only uses very simple mpirun commands (e.g.
> it
> only passes "-n" in the case of MVAPICH2), so maybe you can get away
> with defining mpirun as an alias for srun when running eb, as a
> workaround?
>
Alias doesn't seem to work when run via "eb". Interactively it works just fine
except it tries to use the wrong partition in SLURM. Some of my nodes do not
have IB interfaces and the default partition is for "serial" jobs that are TCP
over GigE only.
Note: I had to force my buildpath to shared filesystem because when run
interactively the mpirun command failed due to not finding current executing
path in /dev/shm.
with "eb" command:
$ eb BLACS-1.1-gmvapich2-1.8.2.eb --robot=. --force -l
--buildpath=/home/treydock/.local/easybuild/build
<lots of output>
== 2014-08-25 10:33:48,771 main.run ERROR EasyBuild crashed with an error (at
easybuild/tools/run.py:382 in parse_cmd_output): cmd "mpirun -n 2
./EXE/xtc_CsameF77" exited with exitcode 127 and output:
/bin/bash: mpirun: command not found
Doing same steps interactively:
$ which mpirun
alias mpirun='srun '
/usr/bin/srun
$ <Commands to load easybuild MODULEPATH>
$ module load gmvapich2/1.8.2
$ cd
/home/treydock/.local/easybuild/build/BLACS/1.1/gmvapich2-1.8.2/BLACS/INSTALL
$ mpirun -n 2 ./EXE/xtc_CsameF77
srun: error: c0225: tasks 0-1: Illegal instruction (core dumped)
## c0225 != IB node
$ export SLURM_PARTITION="mpi-core8"
$ mpirun -n 2 ./EXE/xtc_CsameF77
If this routine does not complete successfully,
Do _NOT_ set TRANSCOMM = -DCSameF77
Set TRANSCOMM = -DCSameF77
Is there a way to set the alias in the easyconfig?
> mpi_cmd_for is actually a poor mans solution, we should be relying on
> our own mympirun "wrapper script" (see
> https://github.com/hpcugent/vsc-mympirun).
> But that wouldn't have helped in this case, since it doesn't know
> about
> srun either (yet).
>
>
> regards,
>
> Kenneth
>
> >
> > Thanks,
> > - Trey
> >
> > [1]
> > https://github.com/hpcugent/easybuild-easyconfigs/blob/master/easybuild/easyconfigs/b/BLACS/BLACS-1.1-gmvapich2-1.7.9a2.eb
> > [2]
> > https://github.com/hpcugent/easybuild-framework/blob/78690b0771ca971326fd81c20f1b25ed18d801a9/easybuild/tools/toolchain/mpi.py#L177
> >
> > =============================
> >
> > Trey Dockendorf
> > Systems Analyst I
> > Texas A&M University
> > Academy for Advanced Telecommunications and Learning Technologies
> > Phone: (979)458-2396
> > Email: [email protected]
> > Jabber: [email protected]
>
>