Ole, Can you please copy this over to an issue in https://github.com/easybuilders/easybuild-easyconfigs/issues so we can keep track of things there? It is also being discussed in Slack but we should really have the discussion and progress in a location where anyone can find it.
If you don't have a GitHub account, can you give me permission to copy over the content of your email to create the issue. Thanks, Alan On Wed, 25 May 2022 at 10:54, Ole Holm Nielsen <[email protected]> wrote: > Hi Easybuilders, > > I'm testing the upgrade of our compute nodes from Almalinux 8.5 to 8.6 > (the RHEL 8 clone similar to Rocky Linux). > > We have found that *all* MPI codes built with any of the Intel toolchains > intel/2020b or intel/2021b fail after the 8.5 to 8.6 upgrade. The codes > fail also on login nodes, so the Slurm queue system is not involved. > The FOSS toolchains foss/2020b and foss/2021b work perfectly on EL 8.6, > however. > > My simple test uses the attached trivial MPI Hello World code running on a > single node: > > $ module load intel/2021b > $ mpicc mpi_hello_world.c > $ mpirun ./a.out > > Now the mpirun command enters an infinite loop (running many minutes) and > we see these processes with "ps": > > /bin/sh > /home/modules/software/impi/2021.4.0-intel-compilers-2021.4.0/mpi/2021.4.0/bin/mpirun > > ./a.out > mpiexec.hydra ./a.out > > The mpiexec.hydra process doesn't respond to 15/SIGTERM and I have to kill > it with 9/SIGKILL. I've tried to enable debugging output with > export I_MPI_HYDRA_DEBUG=1 > export I_MPI_DEBUG=5 > but nothing gets printed from this. > > Question: Has anyone tried an EL 8.6 Linux with the Intel toolchain and > mpiexec.hydra? Can you suggest how I may debug this issue? > > OS information: > > $ cat /etc/redhat-release > AlmaLinux release 8.6 (Sky Tiger) > $ uname -r > 4.18.0-372.9.1.el8.x86_64 > > Thanks a lot, > Ole > > -- > Ole Holm Nielsen > PhD, Senior HPC Officer > Department of Physics, Technical University of Denmark

