Hi Kenneth, Kenneth Hoste <[email protected]> writes:
> Hi Loris, > > On 10/09/2021 13:09, Loris Bennett wrote: >> Hi, >> >> When building >> >> impi/2021.2.0-intel-compilers-2021.2.0 >> >> with EB 4.4.2 I am getting the following error >> >> == 2021-09-10 11:30:42,184 run.py:233 INFO running cmd: mpirun -n 40 >> /trinity/shared/easybuild/build/impi/2021.2.0/intel-compilers-2021.2.0/mpi_test >> == 2021-09-10 11:30:43,012 run.py:635 INFO parse_log_for_error msg: >> Command used: mpirun -n 40 >> /trinity/shared/easybuild/build/impi/2021.2.0/intel-compilers-2021.2.0/mpi_test >> == 2021-09-10 11:30:43,013 run.py:637 INFO parse_log_for_error (some may >> be harmless) regExp (?<![(,-]|\w)(?:error|segmentation >> fault|failed)(?![(,-]|\.?\w) found: >> admin.curta.zedat.fu-berlin.de.17654hfi_userinit_internal: assign_context >> command failed: Cannot allocate memory >> admin.curta.zedat.fu-berlin.de.17654hfi_userinit_internal: assign_context >> command failed: Cannot allocate memory >> admin.curta.zedat.fu-berlin.de.17654hfi_userinit_internal: assign_context >> command failed: Cannot allocate memory >> admin.curta.zedat.fu-berlin.de.17654hfi_userinit_internal: assign_context >> command failed: Cannot allocate memory >> Abort(1615503) on node 17 (rank 17 in comm 0): Fatal error in PMPI_Init: >> Other MPI error, error stack: >> create_endpoint(2284)........: OFI endpoint open failed >> (ofi_init.c:2284:create_endpoint:Invalid argument) >> == 2021-09-10 11:30:43,014 run.py:594 WARNING Found 6 errors in command >> output (output: admin.curta.zedat.fu-berlin.de.17654hfi_userinit_internal: >> assign_context command failed: Cannot allocate memory >> admin.curta.zedat.fu-berlin.de.17654hfi_userinit_internal: >> assign_context command failed: Cannot allocate memory >> admin.curta.zedat.fu-berlin.de.17654hfi_userinit_internal: >> assign_context command failed: Cannot allocate memory >> admin.curta.zedat.fu-berlin.de.17654hfi_userinit_internal: >> assign_context command failed: Cannot allocate memory >> Abort(1615503) on node 17 (rank 17 in comm 0): Fatal error in >> PMPI_Init: Other MPI error, error stack: >> create_endpoint(2284)........: OFI endpoint open failed >> (ofi_init.c:2284:create_endpoint:Invalid argument)) >> >> Any ideas what might be going wrong? > > In what type of environment are you running this? A Slurm job with > restricted available memory? No, I am just ran this directly on our admin node. > I assume the system you're seeing this on has 40 cores (based on the > "mpirun -n 40")? No, the node has 32 cores. I am not sure where the 40 is coming from. > Can you try using "eb --parallel 10" or "eb --parallel 2" to restrict > the number of MPI processes it's starting for the test, and see if that > helps? Using eb --parallel 2 made no difference - I got the same error message. I then ran the build within a Slurm job with '--parallel 6' and 1 GB per task and this was successful. It seems like the build has to be run within a environment which will allow the sanity check to run an MPI process. Assuming that is indeed the case, should I have been able read that somewhere? Cheers, Loris -- Dr. Loris Bennett (Hr./Mr.) ZEDAT, Freie Universität Berlin Email [email protected]

