Loris Bennett <[email protected]> writes:

> Hi Kenneth,
>
> Kenneth Hoste <[email protected]> writes:
>
>> Hi Loris,
>>
>> On 10/09/2021 13:09, Loris Bennett wrote:
>>> Hi,
>>>
>>> When building
>>>   
>>>    impi/2021.2.0-intel-compilers-2021.2.0
>>>
>>> with EB 4.4.2 I am getting the following error
>>>
>>>    == 2021-09-10 11:30:42,184 run.py:233 INFO running cmd: mpirun -n 40 
>>> /trinity/shared/easybuild/build/impi/2021.2.0/intel-compilers-2021.2.0/mpi_test
>>>    == 2021-09-10 11:30:43,012 run.py:635 INFO parse_log_for_error msg: 
>>> Command used: mpirun -n 40 
>>> /trinity/shared/easybuild/build/impi/2021.2.0/intel-compilers-2021.2.0/mpi_test
>>>    == 2021-09-10 11:30:43,013 run.py:637 INFO parse_log_for_error (some may 
>>> be harmless) regExp (?<![(,-]|\w)(?:error|segmentation 
>>> fault|failed)(?![(,-]|\.?\w) found:
>>>    admin.curta.zedat.fu-berlin.de.17654hfi_userinit_internal: 
>>> assign_context command failed: Cannot allocate memory
>>>    admin.curta.zedat.fu-berlin.de.17654hfi_userinit_internal: 
>>> assign_context command failed: Cannot allocate memory
>>>    admin.curta.zedat.fu-berlin.de.17654hfi_userinit_internal: 
>>> assign_context command failed: Cannot allocate memory
>>>    admin.curta.zedat.fu-berlin.de.17654hfi_userinit_internal: 
>>> assign_context command failed: Cannot allocate memory
>>>    Abort(1615503) on node 17 (rank 17 in comm 0): Fatal error in PMPI_Init: 
>>> Other MPI error, error stack:
>>>    create_endpoint(2284)........: OFI endpoint open failed 
>>> (ofi_init.c:2284:create_endpoint:Invalid argument)
>>>    == 2021-09-10 11:30:43,014 run.py:594 WARNING Found 6 errors in command 
>>> output (output: admin.curta.zedat.fu-berlin.de.17654hfi_userinit_internal: 
>>> assign_context command failed: Cannot allocate memory
>>>            admin.curta.zedat.fu-berlin.de.17654hfi_userinit_internal: 
>>> assign_context command failed: Cannot allocate memory
>>>            admin.curta.zedat.fu-berlin.de.17654hfi_userinit_internal: 
>>> assign_context command failed: Cannot allocate memory
>>>            admin.curta.zedat.fu-berlin.de.17654hfi_userinit_internal: 
>>> assign_context command failed: Cannot allocate memory
>>>            Abort(1615503) on node 17 (rank 17 in comm 0): Fatal error in 
>>> PMPI_Init: Other MPI error, error stack:
>>>            create_endpoint(2284)........: OFI endpoint open failed 
>>> (ofi_init.c:2284:create_endpoint:Invalid argument))
>>>
>>> Any ideas what might be going wrong?
>>
>> In what type of environment are you running this? A Slurm job with 
>> restricted available memory?
>
> No, I am just ran this directly on our admin node.
>
>> I assume the system you're seeing this on has 40 cores (based on the 
>> "mpirun -n 40")?
>
> No, the node has 32 cores.  I am not sure where the 40 is coming from.

Sorry, the node does indeed have 40.  I was looking at the wrong window :-/ 

>> Can you try using "eb --parallel 10" or "eb --parallel 2" to restrict 
>> the number of MPI processes it's starting for the test, and see if that 
>> helps?
>
> Using
>
>   eb --parallel 2
>
> made no difference - I got the same error message.
>
> I then ran the build within a Slurm job with '--parallel 6' and 1 GB per
> task and this was successful.
>
> It seems like the build has to be run within a environment which will
> allow the sanity check to run an MPI process.  Assuming that is indeed
> the case, should I have been able read that somewhere?
>
> Cheers,
>
> Loris
-- 
Dr. Loris Bennett (Hr./Mr.)
ZEDAT, Freie Universität Berlin         Email [email protected]

Reply via email to