Hi Alan,

Thanks a lot for the feedback!  I've opened a new issue now:
https://github.com/easybuilders/easybuild-easyconfigs/issues/15651

Best regards,
Ole

On 6/9/22 10:52, Alan O'Cais wrote:
Ole,

Can you please copy this over to an issue in https://github.com/easybuilders/easybuild-easyconfigs/issues <https://github.com/easybuilders/easybuild-easyconfigs/issues> so we can keep track of things there? It is also being discussed in Slack but we should really have the discussion and progress in a location where anyone can find it.

If you don't have a GitHub account, can you give me permission to copy over the content of your email to create the issue.

Thanks,

Alan

On Wed, 25 May 2022 at 10:54, Ole Holm Nielsen <[email protected] <mailto:[email protected]>> wrote:

    Hi Easybuilders,

    I'm testing the upgrade of our compute nodes from Almalinux 8.5 to 8.6
    (the RHEL 8 clone similar to Rocky Linux).

    We have found that *all* MPI codes built with any of the Intel toolchains
    intel/2020b or intel/2021b fail after the 8.5 to 8.6 upgrade.  The codes
    fail also on login nodes, so the Slurm queue system is not involved.
    The FOSS toolchains foss/2020b and foss/2021b work perfectly on EL 8.6,
    however.

    My simple test uses the attached trivial MPI Hello World code running
    on a
    single node:

    $ module load intel/2021b
    $ mpicc mpi_hello_world.c
    $ mpirun ./a.out

    Now the mpirun command enters an infinite loop (running many minutes) and
    we see these processes with "ps":

    /bin/sh
    
/home/modules/software/impi/2021.4.0-intel-compilers-2021.4.0/mpi/2021.4.0/bin/mpirun

    ./a.out
    mpiexec.hydra ./a.out

    The mpiexec.hydra process doesn't respond to 15/SIGTERM and I have to
    kill
    it with 9/SIGKILL.  I've tried to enable debugging output with
    export I_MPI_HYDRA_DEBUG=1
    export I_MPI_DEBUG=5
    but nothing gets printed from this.

    Question: Has anyone tried an EL 8.6 Linux with the Intel toolchain and
    mpiexec.hydra?  Can you suggest how I may debug this issue?

    OS information:

    $ cat /etc/redhat-release
    AlmaLinux release 8.6 (Sky Tiger)
    $ uname -r
    4.18.0-372.9.1.el8.x86_64

    Thanks a lot,
    Ole

-- Ole Holm Nielsen
    PhD, Senior HPC Officer
    Department of Physics, Technical University of Denmark


--
Ole Holm Nielsen
PhD, Senior HPC Officer
Department of Physics, Technical University of Denmark,
Fysikvej Building 309, DK-2800 Kongens Lyngby, Denmark
E-mail: [email protected]
Homepage: http://dcwww.fysik.dtu.dk/~ohnielse/
Mobile: (+45) 5180 1620

Reply via email to