Ole,

Can you please copy this over to an issue in
https://github.com/easybuilders/easybuild-easyconfigs/issues so we can keep
track of things there? It is also being discussed in Slack but we should
really have the discussion and progress in a location where anyone can find
it.

If you don't have a GitHub account, can you give me permission to copy over
the content of your email to create the issue.

Thanks,

Alan

On Wed, 25 May 2022 at 10:54, Ole Holm Nielsen <[email protected]>
wrote:

> Hi Easybuilders,
>
> I'm testing the upgrade of our compute nodes from Almalinux 8.5 to 8.6
> (the RHEL 8 clone similar to Rocky Linux).
>
> We have found that *all* MPI codes built with any of the Intel toolchains
> intel/2020b or intel/2021b fail after the 8.5 to 8.6 upgrade.  The codes
> fail also on login nodes, so the Slurm queue system is not involved.
> The FOSS toolchains foss/2020b and foss/2021b work perfectly on EL 8.6,
> however.
>
> My simple test uses the attached trivial MPI Hello World code running on a
> single node:
>
> $ module load intel/2021b
> $ mpicc mpi_hello_world.c
> $ mpirun ./a.out
>
> Now the mpirun command enters an infinite loop (running many minutes) and
> we see these processes with "ps":
>
> /bin/sh
> /home/modules/software/impi/2021.4.0-intel-compilers-2021.4.0/mpi/2021.4.0/bin/mpirun
>
> ./a.out
> mpiexec.hydra ./a.out
>
> The mpiexec.hydra process doesn't respond to 15/SIGTERM and I have to kill
> it with 9/SIGKILL.  I've tried to enable debugging output with
> export I_MPI_HYDRA_DEBUG=1
> export I_MPI_DEBUG=5
> but nothing gets printed from this.
>
> Question: Has anyone tried an EL 8.6 Linux with the Intel toolchain and
> mpiexec.hydra?  Can you suggest how I may debug this issue?
>
> OS information:
>
> $ cat /etc/redhat-release
> AlmaLinux release 8.6 (Sky Tiger)
> $ uname -r
> 4.18.0-372.9.1.el8.x86_64
>
> Thanks a lot,
> Ole
>
> --
> Ole Holm Nielsen
> PhD, Senior HPC Officer
> Department of Physics, Technical University of Denmark

Reply via email to