Hi Gilles,
srun -N 1 -n 1 orted
that is expected to fail, but it should at least find all its
dependencies and start
This was quite illuminating!
andrej@terra:~/system/tests/MPI$ srun -N 1 -n 1 orted
srun: /usr/local/lib/slurm/switch_generic.so: Incompatible Slurm plugin
version (20.02.6)
srun: error: Couldn't load specified plugin name for switch/generic:
Incompatible plugin version
srun: /usr/local/lib/slurm/mpi_pmix.so: Incompatible Slurm plugin
version (20.02.6)
srun: error: Couldn't load specified plugin name for mpi/pmix:
Incompatible plugin version
srun: error: cannot create mpi context for mpi/pmix
srun: error: invalid MPI type 'pmix', --mpi=list for acceptable types
So it looks like there were conflicting slurm versions running --
20.02.6 (slurmdbd) and 20.11.3 (slurmctld/slurmd). I deleted all slurm
stuff in /usr/local and reconfigured/rebuilt/reinstalled 20.11.3. Now
I'm getting this:
andrej@terra:~$ srun -N 1 -n 1 orted
srun: error: Couldn't find the specified plugin name for mpi/pmix
looking at all files
srun: error: cannot find mpi plugin for mpi/pmix
srun: error: cannot create mpi context for mpi/pmix
srun: error: invalid MPI type 'pmix', --mpi=list for acceptable types
It seems that slurm doesn't see pmix:
andrej@terra:~$ srun --mpi=list
srun: MPI types are...
srun: cray_shasta
srun: none
srun: pmi2
I'll try to point slurm to use openmpi's internal pmix and rebuild, but
posting this now in case I'm going down the rabbit hole and someone has
a better idea.
Cheers,
Andrej