Dear Frank,

You're probably running into this on a RHEL8 derivative?

Intel MPI 2019 update 5 is known to be broken on those OS versions, see https://github.com/easybuilders/easybuild-easyconfigs/issues/11762 .

The best way to handle this is probably to use a custom intel-2019b.eb that uses Intel MPI 2019 update 7, which should work...


regards,

Kenneth

On 17/02/2022 09:52, Heckes, Frank wrote:
Hi all,

I didn’t find a solution for my problem neither in the mail archive nor via google. I tried to build intel-2019b.eb. The process runs successful till it reaches the sanity check

== sanity checking...

== ... (took 1 secs)

== FAILED: Installation ended unsuccessfully (build directory: /opt/local/easybuild/build/impi/2018.5.288/iccifort-2019.5.281): build failed (first 300 chars): Sanity check failed: sanity check command mpirun -n 36 /opt/local/easybuild/build/impi/2018.5.288/iccifort-2019.5.281/mpi_test exited with code 11 (output:

As the iccifort-2019.5.281 is already available I loaded this module in another session. Starting the test manually leads to the errors below (see ‘Errors without FI- environment variables’) Setting the env.var. export FI_PROVIDER=tcp fix the problem. Now the test completes: mpirun -np 36 /opt/local/easybuild/build/impi/2018.5.288/iccifort-2019.5.281/mpi_test

Hello world: rank 0 of 36 running on atlas52

Hello world: rank 1 of 36 running on atlas52

Hello world: rank 2 of 36 running on atlas52

. . .

By assigning verbs the error appears again (The node has a valid ofedstack software, IP address assigned to HCA and is operational for other MPI apps)

Two questions:

  * How can I set-up the env. Variables so that eb will use them during
    the test. (doing export FI_PROVIDER=…; eb intel2019b –robot doesn’t
    help)
  * Although I can see the verbs provider (running fi_info) I ran into
    an error. Did I miss a dependency to intel MPI?

Many thanks in advance for any help and advise.

Cheers,

-Frank Heckes

------------------------------   Errors without FI- environment variables

mpirun -np 36 /opt/local/easybuild/build/impi/2018.5.288/iccifort-2019.5.281/mpi_test

Abort(1091471) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack:

MPIR_Init_thread(703).......:

MPID_Init(923)..............:

MPIDI_OFI_mpi_init_hook(883): OFI addrinfo() failure

Abort(1091471) on node 11 (rank 11 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack:

MPIR_Init_thread(703).......:

MPID_Init(923)..............:

MPIDI_OFI_mpi_init_hook(883): OFI addrinfo() failure

Abort(1091471) on node 12 (rank 12 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack:

MPIR_Init_thread(703).......:

MPID_Init(923)..............:

MPIDI_OFI_mpi_init_hook(883): OFI addrinfo() failure

Abort(1091471) on node 14 (rank 14 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack:

MPIR_Init_thread(703).......:

MPID_Init(923)..............:

MPIDI_OFI_mpi_init_hook(883): OFI addrinfo() failure


Reply via email to