Loris Bennett <[email protected]> writes: > Hi Kenneth, > > Kenneth Hoste <[email protected]> writes: > >> Hi Loris, >> >> On 10/09/2021 13:09, Loris Bennett wrote: >>> Hi, >>> >>> When building >>> >>> impi/2021.2.0-intel-compilers-2021.2.0 >>> >>> with EB 4.4.2 I am getting the following error >>> >>> == 2021-09-10 11:30:42,184 run.py:233 INFO running cmd: mpirun -n 40 >>> /trinity/shared/easybuild/build/impi/2021.2.0/intel-compilers-2021.2.0/mpi_test >>> == 2021-09-10 11:30:43,012 run.py:635 INFO parse_log_for_error msg: >>> Command used: mpirun -n 40 >>> /trinity/shared/easybuild/build/impi/2021.2.0/intel-compilers-2021.2.0/mpi_test >>> == 2021-09-10 11:30:43,013 run.py:637 INFO parse_log_for_error (some may >>> be harmless) regExp (?<![(,-]|\w)(?:error|segmentation >>> fault|failed)(?![(,-]|\.?\w) found: >>> admin.curta.zedat.fu-berlin.de.17654hfi_userinit_internal: >>> assign_context command failed: Cannot allocate memory >>> admin.curta.zedat.fu-berlin.de.17654hfi_userinit_internal: >>> assign_context command failed: Cannot allocate memory >>> admin.curta.zedat.fu-berlin.de.17654hfi_userinit_internal: >>> assign_context command failed: Cannot allocate memory >>> admin.curta.zedat.fu-berlin.de.17654hfi_userinit_internal: >>> assign_context command failed: Cannot allocate memory >>> Abort(1615503) on node 17 (rank 17 in comm 0): Fatal error in PMPI_Init: >>> Other MPI error, error stack: >>> create_endpoint(2284)........: OFI endpoint open failed >>> (ofi_init.c:2284:create_endpoint:Invalid argument) >>> == 2021-09-10 11:30:43,014 run.py:594 WARNING Found 6 errors in command >>> output (output: admin.curta.zedat.fu-berlin.de.17654hfi_userinit_internal: >>> assign_context command failed: Cannot allocate memory >>> admin.curta.zedat.fu-berlin.de.17654hfi_userinit_internal: >>> assign_context command failed: Cannot allocate memory >>> admin.curta.zedat.fu-berlin.de.17654hfi_userinit_internal: >>> assign_context command failed: Cannot allocate memory >>> admin.curta.zedat.fu-berlin.de.17654hfi_userinit_internal: >>> assign_context command failed: Cannot allocate memory >>> Abort(1615503) on node 17 (rank 17 in comm 0): Fatal error in >>> PMPI_Init: Other MPI error, error stack: >>> create_endpoint(2284)........: OFI endpoint open failed >>> (ofi_init.c:2284:create_endpoint:Invalid argument)) >>> >>> Any ideas what might be going wrong? >> >> In what type of environment are you running this? A Slurm job with >> restricted available memory? > > No, I am just ran this directly on our admin node. > >> I assume the system you're seeing this on has 40 cores (based on the >> "mpirun -n 40")? > > No, the node has 32 cores. I am not sure where the 40 is coming from.
Sorry, the node does indeed have 40. I was looking at the wrong window :-/ >> Can you try using "eb --parallel 10" or "eb --parallel 2" to restrict >> the number of MPI processes it's starting for the test, and see if that >> helps? > > Using > > eb --parallel 2 > > made no difference - I got the same error message. > > I then ran the build within a Slurm job with '--parallel 6' and 1 GB per > task and this was successful. > > It seems like the build has to be run within a environment which will > allow the sanity check to run an MPI process. Assuming that is indeed > the case, should I have been able read that somewhere? > > Cheers, > > Loris -- Dr. Loris Bennett (Hr./Mr.) ZEDAT, Freie Universität Berlin Email [email protected]

