Hi Kenneth,

Kenneth Hoste <[email protected]> writes:

> Hi Loris,
>
> On 10/09/2021 13:09, Loris Bennett wrote:
>> Hi,
>>
>> When building
>>   
>>    impi/2021.2.0-intel-compilers-2021.2.0
>>
>> with EB 4.4.2 I am getting the following error
>>
>>    == 2021-09-10 11:30:42,184 run.py:233 INFO running cmd: mpirun -n 40 
>> /trinity/shared/easybuild/build/impi/2021.2.0/intel-compilers-2021.2.0/mpi_test
>>    == 2021-09-10 11:30:43,012 run.py:635 INFO parse_log_for_error msg: 
>> Command used: mpirun -n 40 
>> /trinity/shared/easybuild/build/impi/2021.2.0/intel-compilers-2021.2.0/mpi_test
>>    == 2021-09-10 11:30:43,013 run.py:637 INFO parse_log_for_error (some may 
>> be harmless) regExp (?<![(,-]|\w)(?:error|segmentation 
>> fault|failed)(?![(,-]|\.?\w) found:
>>    admin.curta.zedat.fu-berlin.de.17654hfi_userinit_internal: assign_context 
>> command failed: Cannot allocate memory
>>    admin.curta.zedat.fu-berlin.de.17654hfi_userinit_internal: assign_context 
>> command failed: Cannot allocate memory
>>    admin.curta.zedat.fu-berlin.de.17654hfi_userinit_internal: assign_context 
>> command failed: Cannot allocate memory
>>    admin.curta.zedat.fu-berlin.de.17654hfi_userinit_internal: assign_context 
>> command failed: Cannot allocate memory
>>    Abort(1615503) on node 17 (rank 17 in comm 0): Fatal error in PMPI_Init: 
>> Other MPI error, error stack:
>>    create_endpoint(2284)........: OFI endpoint open failed 
>> (ofi_init.c:2284:create_endpoint:Invalid argument)
>>    == 2021-09-10 11:30:43,014 run.py:594 WARNING Found 6 errors in command 
>> output (output: admin.curta.zedat.fu-berlin.de.17654hfi_userinit_internal: 
>> assign_context command failed: Cannot allocate memory
>>            admin.curta.zedat.fu-berlin.de.17654hfi_userinit_internal: 
>> assign_context command failed: Cannot allocate memory
>>            admin.curta.zedat.fu-berlin.de.17654hfi_userinit_internal: 
>> assign_context command failed: Cannot allocate memory
>>            admin.curta.zedat.fu-berlin.de.17654hfi_userinit_internal: 
>> assign_context command failed: Cannot allocate memory
>>            Abort(1615503) on node 17 (rank 17 in comm 0): Fatal error in 
>> PMPI_Init: Other MPI error, error stack:
>>            create_endpoint(2284)........: OFI endpoint open failed 
>> (ofi_init.c:2284:create_endpoint:Invalid argument))
>>
>> Any ideas what might be going wrong?
>
> In what type of environment are you running this? A Slurm job with 
> restricted available memory?

No, I am just ran this directly on our admin node.

> I assume the system you're seeing this on has 40 cores (based on the 
> "mpirun -n 40")?

No, the node has 32 cores.  I am not sure where the 40 is coming from.

> Can you try using "eb --parallel 10" or "eb --parallel 2" to restrict 
> the number of MPI processes it's starting for the test, and see if that 
> helps?

Using

  eb --parallel 2

made no difference - I got the same error message.

I then ran the build within a Slurm job with '--parallel 6' and 1 GB per
task and this was successful.

It seems like the build has to be run within a environment which will
allow the sanity check to run an MPI process.  Assuming that is indeed
the case, should I have been able read that somewhere?

Cheers,

Loris

-- 
Dr. Loris Bennett (Hr./Mr.)
ZEDAT, Freie Universität Berlin         Email [email protected]

Reply via email to