When we build on nodes with Intel Omni-Path software installed, the
build of "iomkl" fails:
$ eb iomkl-2018.02.eb -r
== temporary log file in case of crash /tmp/eb-MDUNmr/easybuild-UxI3tr.log
== resolving dependencies ...
== processing EasyBuild easyconfig
/home/modules/software/EasyBuild/3.6.0/lib/python2.7/site-packages/easybuild_easyconfigs-3.6.0-py2.7.egg/easybuild/easyconfigs/o/OpenMPI/OpenMPI-2.1.3-iccifort-2018.2.199-GCC-6.4.0-2.28.eb
== building and installing
OpenMPI/2.1.3-iccifort-2018.2.199-GCC-6.4.0-2.28...
== fetching files...
== creating build dir, resetting environment...
== unpacking...
== patching...
== preparing...
== configuring...
== building...
== FAILED: Installation ended unsuccessfully (build directory:
/home/modules/build/OpenMPI/2.1.3/iccifort-2018.2.199-GCC-6.4.0-2.28):
build failed (first 300 chars): cmd " make -j 48 " exited with exit code
2 and output:
Making all in config
make[1]: Entering directory
`/home/modules/build/OpenMPI/2.1.3/iccifort-2018.2.199-GCC-6.4.0-2.28/openmpi-2.1.3/config'
Intel Omni-Path systems
make[1]: Nothing to be done for `all'.
make[1]: Leaving directory `/home/modules/build/OpenMPI/2.1.3/icc
== Results of the build can be found in the log file(s)
/tmp/eb-MDUNmr/easybuild-OpenMPI-2.1.3-20180430.105741.vWNYR.log
ERROR: Build of
/home/modules/software/EasyBuild/3.6.0/lib/python2.7/site-packages/easybuild_easyconfigs-3.6.0-py2.7.egg/easybuild/easyconfigs/o/OpenMPI/OpenMPI-2.1.3-iccifort-2018.2.199-GCC-6.4.0-2.28.eb
failed (err: 'build failed (first 300 chars): cmd " make -j 48 " exited
with exit code 2 and output:\nMaking all in config\nmake[1]: Entering
directory
`/home/modules/build/OpenMPI/2.1.3/iccifort-2018.2.199-GCC-6.4.0-2.28/openmpi-2.1.3/config\'\nmake[1]:
Nothing to be done for `all\'.\nmake[1]: Leaving directory
`/home/modules/build/OpenMPI/2.1.3/icc')
The OpenMPI error log file contains near the end:
CCLD libopen-pal.la
ld: cannot find -lpciaccess
ld: cannot find -lxml2
make[2]: *** [libopen-pal.la] Error 1
make[2]: Leaving directory
`/home/modules/build/OpenMPI/2.1.3/iccifort-2018.2.199-GCC-6.4.0-2.28/openmpi-2.1.3/opal'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory
`/home/modules/build/OpenMPI/2.1.3/iccifort-2018.2.199-GCC-6.4.0-2.28/openmpi-2.1.3/opal'
make: *** [all-recursive] Error 1
(at easybuild/tools/run.py:501 in parse_cmd_output)
== 2018-04-30 11:03:17,152 easyblock.py:2702 WARNING build failed (first
300 chars): cmd " make -j 48 " exited with exit code 2 and output:
Making all in config
make[1]: Entering directory `/home/modules/build/OpenMPI/2.1.3/iccifort-
Intel Omni-Path systems2018.2.199-GCC-6.4.0-2.28/openmpi-2.1.3/config'
make[1]: Nothing to be done for `all'.
make[1]: Leaving directory `/home/modules/build/OpenMPI/2.1.3/icc
== 2018-04-30 11:03:17,152 easyblock.py:280 INFO Closing log for
application name OpenMPI version 2.1.3
Question: Can anyone point to the cause of this error? Did OpenMPI
2.1.3 introduce this error?
Extra information: The issue
https://github.com/easybuilders/easybuild-easyconfigs/issues/5805 is
fixed with EB 3.6.0 and OpenMPI 2.1.3. On our systems with Mellanox
Infiniband software installed, the line in
OpenMPI-2.1.3-iccifort-2018.2.199-GCC-6.4.0-2.28.eb *does* fix the issue
5805 on this platform:
configopts += '--without-ucx ' # hard disable UCX, to dance around bug
(https://github.com/open-mpi/ompi/issues/4345)
On the Intel Omni-Path system I tried to comment out this line, but the
build still fails with the same error.
/Ole
On 03/05/2018 03:44 PM, Åke Sandgren wrote:
To clarify, it's a bug in the OpenMPI configure script when dealing with
UCX which they haven't fixed.
On 03/05/2018 03:36 PM, Balázs Hajgató wrote:
Dear Ole,
use
configopts += '--without-ucx '
in the OpenMPI easyconfig
Sincerely,
Balazs
Thanks for any suggestions!
--
Ole Holm Nielsen
PhD, Senior HPC Officer
Department of Physics, Technical University of Denmark,
Building 307, DK-2800 Kongens Lyngby, Denmark
E-mail: [email protected]
Homepage: http://dcwww.fysik.dtu.dk/~ohnielse/
Tel: (+45) 4525 3187 / Mobile (+45) 5180 1620
On 05/03/2018 15:27, Ole Holm Nielsen wrote:
Using EB 3.5.2 I'm trying to build the latest iomkl:
eb iomkl-2018a.eb -r
This works like a charm on 2 of our 3 binary architectures, but on our
Sandy Bridge nodes with Mellanox Infiniband the build aborts:
# eb iomkl-2018a.eb -r
== temporary log file in case of crash
/tmp/eb-Dnjmix/easybuild-6WeIK9.log
== resolving dependencies ...
== processing EasyBuild easyconfig
/home/modules/software/EasyBuild/3.5.2/lib/python2.7/site-packages/easybuild_easyconfigs-3.5.2-py2.7.egg/easybuild/easyconfigs/o/OpenMPI/OpenMPI-2.1.2-iccifort-2018.1.163-GCC-6.4.0-2.28.eb
== building and installing
OpenMPI/2.1.2-iccifort-2018.1.163-GCC-6.4.0-2.28...
== fetching files...
== creating build dir, resetting environment...
== unpacking...
== patching...
== preparing...
== configuring...
== building...
== FAILED: Installation ended unsuccessfully (build directory:
/home/modules/build/OpenMPI/2.1.2/iccifort-2018.1.163-GCC-6.4.0-2.28):
build failed (first 300 chars): cmd " make -j 16 " exited with exit
code 2 and output:
Making all in config
make[1]: Entering directory
`/home/modules/build/OpenMPI/2.1.2/iccifort-2018.1.163-GCC-6.4.0-2.28/openmpi-2.1.2/config'
make[1]: Nothing to be done for `all'.
make[1]: Leaving directory `/home/modules/build/OpenMPI/2.1.2/icc
...
The end of the log file
/tmp/eb-Dnjmix/easybuild-OpenMPI-2.1.2-20180305.144907.SAFtP.log
contains:
libtool: error: require no space between '-L' and '-lrt'
make[2]: *** [libmca_pml_ucx.la] Error 1
make[2]: Leaving directory
`/home/modules/build/OpenMPI/2.1.2/iccifort-2018.1.163-GCC-6.4.0-2.28/openmpi-2.1.2/ompi/mca/pml/ucx'
make[1]: *** [all-recursive] Error 1
I can't make out what's causing this. Perhaps the build server has a
library installed triggering an error?