Hi Gilles,

srun -N 1 -n 1 orted

that is expected to fail, but it should at least find all its
dependencies and start

This was quite illuminating!

andrej@terra:~/system/tests/MPI$ srun -N 1 -n 1 orted
srun: /usr/local/lib/slurm/switch_generic.so: Incompatible Slurm plugin version (20.02.6) srun: error: Couldn't load specified plugin name for switch/generic: Incompatible plugin version srun: /usr/local/lib/slurm/mpi_pmix.so: Incompatible Slurm plugin version (20.02.6) srun: error: Couldn't load specified plugin name for mpi/pmix: Incompatible plugin version
srun: error: cannot create mpi context for mpi/pmix
srun: error: invalid MPI type 'pmix', --mpi=list for acceptable types

So it looks like there were conflicting slurm versions running -- 20.02.6 (slurmdbd) and 20.11.3 (slurmctld/slurmd). I deleted all slurm stuff in /usr/local and reconfigured/rebuilt/reinstalled 20.11.3. Now I'm getting this:

andrej@terra:~$ srun -N 1 -n 1 orted
srun: error: Couldn't find the specified plugin name for mpi/pmix looking at all files
srun: error: cannot find mpi plugin for mpi/pmix
srun: error: cannot create mpi context for mpi/pmix
srun: error: invalid MPI type 'pmix', --mpi=list for acceptable types

It seems that slurm doesn't see pmix:

andrej@terra:~$ srun --mpi=list
srun: MPI types are...
srun: cray_shasta
srun: none
srun: pmi2

I'll try to point slurm to use openmpi's internal pmix and rebuild, but posting this now in case I'm going down the rabbit hole and someone has a better idea.

Cheers,
Andrej

Reply via email to