> We should probably enhance the mpi_cmd_for function such that it
> check
> whether mpirun is available, as falls back to different commands if
> not,
> including srun.
> It's not terribly difficult to do this I think, if you're familiar
> with
> Python at least: rather than hardcoding "mpirun" when defining the
> "mpi_cmds" dictionary, it should
> insert whichever command is available first (I think?).
> 

That sounds about right.  Or make it a default "Hash" (or whatever Python term 
is for key value pair data structure) and the top-level keys would be based on 
a config option of "scheduler" and then the default would be "default" which is 
current behavior?  I can imagine defaults being able to work 99% of the time 
but having the ability to override them all would be nice.  For example, 
MVAPICH2 built with --with-pm=no requires use of "srun" but OpenMPI still 
builds mpirun even when built against the SLURM PMI library.  Also a way to 
customize the "mpirun" (or srun) command would be ideal (see issue below 
regarding wrong partition being used).  For now can tweak behavior using 
environment variables.

> The mpi_cmd_for function only uses very simple mpirun commands (e.g.
> it
> only passes "-n" in the case of MVAPICH2), so maybe you can get away
> with defining mpirun as an alias for srun when running eb, as a
> workaround?
> 

Alias doesn't seem to work when run via "eb".  Interactively it works just fine 
except it tries to use the wrong partition in SLURM.  Some of my nodes do not 
have IB interfaces and the default partition is for "serial" jobs that are TCP 
over GigE only.

Note: I had to force my buildpath to shared filesystem because when run 
interactively the mpirun command failed due to not finding current executing 
path in /dev/shm.

with "eb" command:

$ eb BLACS-1.1-gmvapich2-1.8.2.eb --robot=. --force -l 
--buildpath=/home/treydock/.local/easybuild/build
<lots of output>
== 2014-08-25 10:33:48,771 main.run ERROR EasyBuild crashed with an error (at 
easybuild/tools/run.py:382 in parse_cmd_output): cmd "mpirun -n 2 
./EXE/xtc_CsameF77" exited with exitcode 127 and output:
/bin/bash: mpirun: command not found

Doing same steps interactively:
$ which mpirun
alias mpirun='srun '
        /usr/bin/srun
$ <Commands to load easybuild MODULEPATH>
$ module load gmvapich2/1.8.2
$ cd 
/home/treydock/.local/easybuild/build/BLACS/1.1/gmvapich2-1.8.2/BLACS/INSTALL
$ mpirun -n 2 ./EXE/xtc_CsameF77
srun: error: c0225: tasks 0-1: Illegal instruction (core dumped)

## c0225 != IB node

$ export SLURM_PARTITION="mpi-core8"
$ mpirun -n 2 ./EXE/xtc_CsameF77
If this routine does not complete successfully,
 Do _NOT_ set TRANSCOMM = -DCSameF77


 Set TRANSCOMM = -DCSameF77



Is there a way to set the alias in the easyconfig?

> mpi_cmd_for is actually a poor mans solution, we should be relying on
> our own mympirun "wrapper script" (see
> https://github.com/hpcugent/vsc-mympirun).
> But that wouldn't have helped in this case, since it doesn't know
> about
> srun either (yet).
> 
> 
> regards,
> 
> Kenneth
> 
> >
> > Thanks,
> > - Trey
> >
> > [1]
> > https://github.com/hpcugent/easybuild-easyconfigs/blob/master/easybuild/easyconfigs/b/BLACS/BLACS-1.1-gmvapich2-1.7.9a2.eb
> > [2]
> > https://github.com/hpcugent/easybuild-framework/blob/78690b0771ca971326fd81c20f1b25ed18d801a9/easybuild/tools/toolchain/mpi.py#L177
> >
> > =============================
> >
> > Trey Dockendorf
> > Systems Analyst I
> > Texas A&M University
> > Academy for Advanced Telecommunications and Learning Technologies
> > Phone: (979)458-2396
> > Email: [email protected]
> > Jabber: [email protected]
> 
> 

Reply via email to