On 25/08/14 17:47, Trey Dockendorf wrote:
We should probably enhance the mpi_cmd_for function such that it
check
whether mpirun is available, as falls back to different commands if
not,
including srun.
It's not terribly difficult to do this I think, if you're familiar
with
Python at least: rather than hardcoding "mpirun" when defining the
"mpi_cmds" dictionary, it should
insert whichever command is available first (I think?).

That sounds about right.  Or make it a default "Hash" (or whatever Python term 
is for key value pair data structure)
dictionary (or 'dict')


  and the top-level keys would be based on a config option of "scheduler" and then the default would be 
"default" which is current behavior?  I can imagine defaults being able to work 99% of the time but having 
the ability to override them all would be nice.  For example, MVAPICH2 built with --with-pm=no requires use of 
"srun" but OpenMPI still builds mpirun even when built against the SLURM PMI library.  Also a way to 
customize the "mpirun" (or srun) command would be ideal (see issue below regarding wrong partition being 
used).  For now can tweak behavior using environment variables.

Dictionaries are unordered w.r.t. keys, so it would have to be a little bit different, but it's definitely doable.


The mpi_cmd_for function only uses very simple mpirun commands (e.g.
it
only passes "-n" in the case of MVAPICH2), so maybe you can get away
with defining mpirun as an alias for srun when running eb, as a
workaround?

Alias doesn't seem to work when run via "eb".  Interactively it works just fine except it 
tries to use the wrong partition in SLURM.  Some of my nodes do not have IB interfaces and the 
default partition is for "serial" jobs that are TCP over GigE only.


Note: I had to force my buildpath to shared filesystem because when run 
interactively the mpirun command failed due to not finding current executing 
path in /dev/shm.

with "eb" command:

$ eb BLACS-1.1-gmvapich2-1.8.2.eb --robot=. --force -l 
--buildpath=/home/treydock/.local/easybuild/build
<lots of output>
== 2014-08-25 10:33:48,771 main.run ERROR EasyBuild crashed with an error (at 
easybuild/tools/run.py:382 in parse_cmd_output): cmd "mpirun -n 2 
./EXE/xtc_CsameF77" exited with exitcode 127 and output:
/bin/bash: mpirun: command not found

Aliases don't get passed down to subshells, which is why this doesn't work.


Doing same steps interactively:
$ which mpirun
alias mpirun='srun '
         /usr/bin/srun
$ <Commands to load easybuild MODULEPATH>
$ module load gmvapich2/1.8.2
$ cd 
/home/treydock/.local/easybuild/build/BLACS/1.1/gmvapich2-1.8.2/BLACS/INSTALL
$ mpirun -n 2 ./EXE/xtc_CsameF77
srun: error: c0225: tasks 0-1: Illegal instruction (core dumped)

## c0225 != IB node

$ export SLURM_PARTITION="mpi-core8"
$ mpirun -n 2 ./EXE/xtc_CsameF77
If this routine does not complete successfully,
  Do _NOT_ set TRANSCOMM = -DCSameF77


  Set TRANSCOMM = -DCSameF77



Is there a way to set the alias in the easyconfig?

No, and I don't think this makes a lot of sense either. The proper way is to enhance mpi_cmd_for so it knows about the 'srun' alternative.

Based on what you have here, try creating a wrapper script named mpi_run as workaround, and make it available in your path when running 'eb':

    echo '#!/bin/bash\nsrun $@' > mpirun; chmod u+x mpirun
PATH=$PWD:$PATH SLURM_PARTITION="mpi-core8" eb BLACS-1.1-gmvapich2-1.8.2.eb --robot=. --force -l --buildpath=/home/treydock/.local/easybuild/build


Does that work at all?



regards,

Kenneth


mpi_cmd_for is actually a poor mans solution, we should be relying on
our own mympirun "wrapper script" (see
https://github.com/hpcugent/vsc-mympirun).
But that wouldn't have helped in this case, since it doesn't know
about
srun either (yet).


regards,

Kenneth

Thanks,
- Trey

[1]
https://github.com/hpcugent/easybuild-easyconfigs/blob/master/easybuild/easyconfigs/b/BLACS/BLACS-1.1-gmvapich2-1.7.9a2.eb
[2]
https://github.com/hpcugent/easybuild-framework/blob/78690b0771ca971326fd81c20f1b25ed18d801a9/easybuild/tools/toolchain/mpi.py#L177

=============================

Trey Dockendorf
Systems Analyst I
Texas A&M University
Academy for Advanced Telecommunications and Learning Technologies
Phone: (979)458-2396
Email: [email protected]
Jabber: [email protected]

Reply via email to