Dear Frank,
You're probably running into this on a RHEL8 derivative?
Intel MPI 2019 update 5 is known to be broken on those OS versions, see
https://github.com/easybuilders/easybuild-easyconfigs/issues/11762 .
The best way to handle this is probably to use a custom intel-2019b.eb
that uses Intel MPI 2019 update 7, which should work...
regards,
Kenneth
On 17/02/2022 09:52, Heckes, Frank wrote:
Hi all,
I didn’t find a solution for my problem neither in the mail archive nor
via google.
I tried to build intel-2019b.eb. The process runs successful till it
reaches the sanity check
== sanity checking...
== ... (took 1 secs)
== FAILED: Installation ended unsuccessfully (build directory:
/opt/local/easybuild/build/impi/2018.5.288/iccifort-2019.5.281): build
failed (first 300 chars): Sanity check failed: sanity check command
mpirun -n 36
/opt/local/easybuild/build/impi/2018.5.288/iccifort-2019.5.281/mpi_test
exited with code 11 (output:
As the iccifort-2019.5.281 is already available I loaded this module in
another session. Starting the test manually leads to the errors below
(see ‘Errors without FI- environment variables’)
Setting the env.var. export FI_PROVIDER=tcp fix the problem. Now the
test completes:
mpirun -np 36
/opt/local/easybuild/build/impi/2018.5.288/iccifort-2019.5.281/mpi_test
Hello world: rank 0 of 36 running on atlas52
Hello world: rank 1 of 36 running on atlas52
Hello world: rank 2 of 36 running on atlas52
. . .
By assigning verbs the error appears again (The node has a valid
ofedstack software, IP address assigned to HCA and is operational for
other MPI apps)
Two questions:
* How can I set-up the env. Variables so that eb will use them during
the test. (doing export FI_PROVIDER=…; eb intel2019b –robot doesn’t
help)
* Although I can see the verbs provider (running fi_info) I ran into
an error. Did I miss a dependency to intel MPI?
Many thanks in advance for any help and advise.
Cheers,
-Frank Heckes
------------------------------ Errors without FI- environment variables
mpirun -np 36
/opt/local/easybuild/build/impi/2018.5.288/iccifort-2019.5.281/mpi_test
Abort(1091471) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init:
Other MPI error, error stack:
MPIR_Init_thread(703).......:
MPID_Init(923)..............:
MPIDI_OFI_mpi_init_hook(883): OFI addrinfo() failure
Abort(1091471) on node 11 (rank 11 in comm 0): Fatal error in PMPI_Init:
Other MPI error, error stack:
MPIR_Init_thread(703).......:
MPID_Init(923)..............:
MPIDI_OFI_mpi_init_hook(883): OFI addrinfo() failure
Abort(1091471) on node 12 (rank 12 in comm 0): Fatal error in PMPI_Init:
Other MPI error, error stack:
MPIR_Init_thread(703).......:
MPID_Init(923)..............:
MPIDI_OFI_mpi_init_hook(883): OFI addrinfo() failure
Abort(1091471) on node 14 (rank 14 in comm 0): Fatal error in PMPI_Init:
Other MPI error, error stack:
MPIR_Init_thread(703).......:
MPID_Init(923)..............:
MPIDI_OFI_mpi_init_hook(883): OFI addrinfo() failure