On Mon, 30 Apr 2018 at 05:31, Ole Holm Nielsen
<[email protected] <mailto:[email protected]>> wrote:
When we build on nodes with Intel Omni-Path software installed, the
build of "iomkl" fails:
$ eb iomkl-2018.02.eb -r
== temporary log file in case of crash
/tmp/eb-MDUNmr/easybuild-UxI3tr.log
== resolving dependencies ...
== processing EasyBuild easyconfig
/home/modules/software/EasyBuild/3.6.0/lib/python2.7/site-packages/easybuild_easyconfigs-3.6.0-py2.7.egg/easybuild/easyconfigs/o/OpenMPI/OpenMPI-2.1.3-iccifort-2018.2.199-GCC-6.4.0-2.28.eb
== building and installing
OpenMPI/2.1.3-iccifort-2018.2.199-GCC-6.4.0-2.28...
== fetching files...
== creating build dir, resetting environment...
== unpacking...
== patching...
== preparing...
== configuring...
== building...
== FAILED: Installation ended unsuccessfully (build directory:
/home/modules/build/OpenMPI/2.1.3/iccifort-2018.2.199-GCC-6.4.0-2.28):
build failed (first 300 chars): cmd " make -j 48 " exited with exit
code
2 and output:
Making all in config
make[1]: Entering directory
`/home/modules/build/OpenMPI/2.1.3/iccifort-2018.2.199-GCC-6.4.0-2.28/openmpi-2.1.3/config'
Intel Omni-Path systems
make[1]: Nothing to be done for `all'.
make[1]: Leaving directory `/home/modules/build/OpenMPI/2.1.3/icc
== Results of the build can be found in the log file(s)
/tmp/eb-MDUNmr/easybuild-OpenMPI-2.1.3-20180430.105741.vWNYR.log
ERROR: Build of
/home/modules/software/EasyBuild/3.6.0/lib/python2.7/site-packages/easybuild_easyconfigs-3.6.0-py2.7.egg/easybuild/easyconfigs/o/OpenMPI/OpenMPI-2.1.3-iccifort-2018.2.199-GCC-6.4.0-2.28.eb
failed (err: 'build failed (first 300 chars): cmd " make -j 48 "
exited
with exit code 2 and output:\nMaking all in config\nmake[1]:
Entering
directory
`/home/modules/build/OpenMPI/2.1.3/iccifort-2018.2.199-GCC-6.4.0-2.28/openmpi-2.1.3/config\'\nmake[1]:
Nothing to be done for `all\'.\nmake[1]: Leaving directory
`/home/modules/build/OpenMPI/2.1.3/icc')
The OpenMPI error log file contains near the end:
CCLD libopen-pal.la <http://libopen-pal.la>
ld: cannot find -lpciaccess
ld: cannot find -lxml2
make[2]: *** [libopen-pal.la <http://libopen-pal.la>] Error 1
make[2]: Leaving directory
`/home/modules/build/OpenMPI/2.1.3/iccifort-2018.2.199-GCC-6.4.0-2.28/openmpi-2.1.3/opal'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory
`/home/modules/build/OpenMPI/2.1.3/iccifort-2018.2.199-GCC-6.4.0-2.28/openmpi-2.1.3/opal'
make: *** [all-recursive] Error 1
(at easybuild/tools/run.py:501 in parse_cmd_output)
== 2018-04-30 11:03:17,152 easyblock.py:2702 WARNING build failed
(first
300 chars): cmd " make -j 48 " exited with exit code 2 and output:
Making all in config
make[1]: Entering directory
`/home/modules/build/OpenMPI/2.1.3/iccifort-
Intel Omni-Path
systems2018.2.199-GCC-6.4.0-2.28/openmpi-2.1.3/config'
make[1]: Nothing to be done for `all'.
make[1]: Leaving directory `/home/modules/build/OpenMPI/2.1.3/icc
== 2018-04-30 11:03:17,152 easyblock.py:280 INFO Closing log for
application name OpenMPI version 2.1.3
Question: Can anyone point to the cause of this error? Did OpenMPI
2.1.3 introduce this error?
Extra information: The issue
https://github.com/easybuilders/easybuild-easyconfigs/issues/5805 is
fixed with EB 3.6.0 and OpenMPI 2.1.3. On our systems with Mellanox
Infiniband software installed, the line in
OpenMPI-2.1.3-iccifort-2018.2.199-GCC-6.4.0-2.28.eb *does* fix the
issue
5805 on this platform:
configopts += '--without-ucx ' # hard disable UCX, to dance
around bug
(https://github.com/open-mpi/ompi/issues/4345)
On the Intel Omni-Path system I tried to comment out this line,
but the
build still fails with the same error.
/Ole
On 03/05/2018 03:44 PM, Åke Sandgren wrote:
> To clarify, it's a bug in the OpenMPI configure script when
dealing with
> UCX which they haven't fixed.
>
> On 03/05/2018 03:36 PM, Balázs Hajgató wrote:
>> Dear Ole,
>>
>> use
>> configopts += '--without-ucx '
>>
>> in the OpenMPI easyconfig
>>
>> Sincerely,
>>
>> Balazs
>>
>>>>>
>>> Thanks for any suggestions!
>>>
>>
>
-- Ole Holm Nielsen
PhD, Senior HPC Officer
Department of Physics, Technical University of Denmark,
Building 307, DK-2800 Kongens Lyngby, Denmark
E-mail: [email protected]
<mailto:[email protected]>
Homepage: http://dcwww.fysik.dtu.dk/~ohnielse/
Tel: (+45) 4525 3187 / Mobile (+45) 5180 1620
>> On 05/03/2018 15:27, Ole Holm Nielsen wrote:
>>> Using EB 3.5.2 I'm trying to build the latest iomkl:
>>>
>>> eb iomkl-2018a.eb -r
>>>
>>> This works like a charm on 2 of our 3 binary architectures, but
on our
>>> Sandy Bridge nodes with Mellanox Infiniband the build aborts:
>>>
>>> # eb iomkl-2018a.eb -r
>>> == temporary log file in case of crash
>>> /tmp/eb-Dnjmix/easybuild-6WeIK9.log
>>> == resolving dependencies ...
>>> == processing EasyBuild easyconfig
>>>
/home/modules/software/EasyBuild/3.5.2/lib/python2.7/site-packages/easybuild_easyconfigs-3.5.2-py2.7.egg/easybuild/easyconfigs/o/OpenMPI/OpenMPI-2.1.2-iccifort-2018.1.163-GCC-6.4.0-2.28.eb
>>>
>>> == building and installing
>>> OpenMPI/2.1.2-iccifort-2018.1.163-GCC-6.4.0-2.28...
>>> == fetching files...
>>> == creating build dir, resetting environment...
>>> == unpacking...
>>> == patching...
>>> == preparing...
>>> == configuring...
>>> == building...
>>> == FAILED: Installation ended unsuccessfully (build directory:
>>>
/home/modules/build/OpenMPI/2.1.2/iccifort-2018.1.163-GCC-6.4.0-2.28):
>>> build failed (first 300 chars): cmd " make -j 16 " exited
with exit
>>> code 2 and output:
>>> Making all in config
>>> make[1]: Entering directory
>>>
`/home/modules/build/OpenMPI/2.1.2/iccifort-2018.1.163-GCC-6.4.0-2.28/openmpi-2.1.2/config'
>>>
>>> make[1]: Nothing to be done for `all'.
>>> make[1]: Leaving directory
`/home/modules/build/OpenMPI/2.1.2/icc
>>> ...
>>>
>>> The end of the log file
>>>
/tmp/eb-Dnjmix/easybuild-OpenMPI-2.1.2-20180305.144907.SAFtP.log
>>> contains:
>>>
>>> libtool: error: require no space between '-L' and '-lrt'
>>> make[2]: *** [libmca_pml_ucx.la <http://libmca_pml_ucx.la>]
Error 1
>>> make[2]: Leaving directory
>>>
`/home/modules/build/OpenMPI/2.1.2/iccifort-2018.1.163-GCC-6.4.0-2.28/openmpi-2.1.2/ompi/mca/pml/ucx'
>>>
>>> make[1]: *** [all-recursive] Error 1
>>>
>>> I can't make out what's causing this. Perhaps the build server
has a
>>> library installed triggering an error?