When we build on nodes with Intel Omni-Path software installed, the build of "iomkl" fails:

$ eb iomkl-2018.02.eb -r
== temporary log file in case of crash /tmp/eb-MDUNmr/easybuild-UxI3tr.log
== resolving dependencies ...
== processing EasyBuild easyconfig /home/modules/software/EasyBuild/3.6.0/lib/python2.7/site-packages/easybuild_easyconfigs-3.6.0-py2.7.egg/easybuild/easyconfigs/o/OpenMPI/OpenMPI-2.1.3-iccifort-2018.2.199-GCC-6.4.0-2.28.eb == building and installing OpenMPI/2.1.3-iccifort-2018.2.199-GCC-6.4.0-2.28...
== fetching files...
== creating build dir, resetting environment...
== unpacking...
== patching...
== preparing...
== configuring...
== building...
== FAILED: Installation ended unsuccessfully (build directory: /home/modules/build/OpenMPI/2.1.3/iccifort-2018.2.199-GCC-6.4.0-2.28): build failed (first 300 chars): cmd " make -j 48 " exited with exit code 2 and output:
Making all in config
make[1]: Entering directory `/home/modules/build/OpenMPI/2.1.3/iccifort-2018.2.199-GCC-6.4.0-2.28/openmpi-2.1.3/config' Intel Omni-Path systems
make[1]: Nothing to be done for `all'.
make[1]: Leaving directory `/home/modules/build/OpenMPI/2.1.3/icc
== Results of the build can be found in the log file(s) /tmp/eb-MDUNmr/easybuild-OpenMPI-2.1.3-20180430.105741.vWNYR.log ERROR: Build of /home/modules/software/EasyBuild/3.6.0/lib/python2.7/site-packages/easybuild_easyconfigs-3.6.0-py2.7.egg/easybuild/easyconfigs/o/OpenMPI/OpenMPI-2.1.3-iccifort-2018.2.199-GCC-6.4.0-2.28.eb failed (err: 'build failed (first 300 chars): cmd " make -j 48 " exited with exit code 2 and output:\nMaking all in config\nmake[1]: Entering directory `/home/modules/build/OpenMPI/2.1.3/iccifort-2018.2.199-GCC-6.4.0-2.28/openmpi-2.1.3/config\'\nmake[1]: Nothing to be done for `all\'.\nmake[1]: Leaving directory `/home/modules/build/OpenMPI/2.1.3/icc')

The OpenMPI error log file contains near the end:

  CCLD     libopen-pal.la
ld: cannot find -lpciaccess
ld: cannot find -lxml2
make[2]: *** [libopen-pal.la] Error 1
make[2]: Leaving directory `/home/modules/build/OpenMPI/2.1.3/iccifort-2018.2.199-GCC-6.4.0-2.28/openmpi-2.1.3/opal'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/home/modules/build/OpenMPI/2.1.3/iccifort-2018.2.199-GCC-6.4.0-2.28/openmpi-2.1.3/opal'
make: *** [all-recursive] Error 1
 (at easybuild/tools/run.py:501 in parse_cmd_output)
== 2018-04-30 11:03:17,152 easyblock.py:2702 WARNING build failed (first 300 chars): cmd " make -j 48 " exited with exit code 2 and output:
Making all in config
make[1]: Entering directory `/home/modules/build/OpenMPI/2.1.3/iccifort- Intel Omni-Path systems2018.2.199-GCC-6.4.0-2.28/openmpi-2.1.3/config'
make[1]: Nothing to be done for `all'.
make[1]: Leaving directory `/home/modules/build/OpenMPI/2.1.3/icc
== 2018-04-30 11:03:17,152 easyblock.py:280 INFO Closing log for application name OpenMPI version 2.1.3


Question: Can anyone point to the cause of this error? Did OpenMPI 2.1.3 introduce this error?


Extra information: The issue https://github.com/easybuilders/easybuild-easyconfigs/issues/5805 is fixed with EB 3.6.0 and OpenMPI 2.1.3. On our systems with Mellanox Infiniband software installed, the line in OpenMPI-2.1.3-iccifort-2018.2.199-GCC-6.4.0-2.28.eb *does* fix the issue 5805 on this platform:

configopts += '--without-ucx ' # hard disable UCX, to dance around bug (https://github.com/open-mpi/ompi/issues/4345)

On the Intel Omni-Path system I tried to comment out this line, but the build still fails with the same error.

/Ole


On 03/05/2018 03:44 PM, Åke Sandgren wrote:
To clarify, it's a bug in the OpenMPI configure script when dealing with
UCX which they haven't fixed.

On 03/05/2018 03:36 PM, Balázs Hajgató wrote:
Dear Ole,

use
configopts += '--without-ucx '

in the OpenMPI easyconfig

Sincerely,

Balazs


Thanks for any suggestions!



--
Ole Holm Nielsen
PhD, Senior HPC Officer
Department of Physics, Technical University of Denmark,
Building 307, DK-2800 Kongens Lyngby, Denmark
E-mail: [email protected]
Homepage: http://dcwww.fysik.dtu.dk/~ohnielse/
Tel: (+45) 4525 3187 / Mobile (+45) 5180 1620
On 05/03/2018 15:27, Ole Holm Nielsen wrote:
Using EB 3.5.2 I'm trying to build the latest iomkl:

   eb iomkl-2018a.eb -r

This works like a charm on 2 of our 3 binary architectures, but on our
Sandy Bridge nodes with Mellanox Infiniband the build aborts:

# eb iomkl-2018a.eb -r
== temporary log file in case of crash
/tmp/eb-Dnjmix/easybuild-6WeIK9.log
== resolving dependencies ...
== processing EasyBuild easyconfig
/home/modules/software/EasyBuild/3.5.2/lib/python2.7/site-packages/easybuild_easyconfigs-3.5.2-py2.7.egg/easybuild/easyconfigs/o/OpenMPI/OpenMPI-2.1.2-iccifort-2018.1.163-GCC-6.4.0-2.28.eb

== building and installing
OpenMPI/2.1.2-iccifort-2018.1.163-GCC-6.4.0-2.28...
== fetching files...
== creating build dir, resetting environment...
== unpacking...
== patching...
== preparing...
== configuring...
== building...
== FAILED: Installation ended unsuccessfully (build directory:
/home/modules/build/OpenMPI/2.1.2/iccifort-2018.1.163-GCC-6.4.0-2.28):
build failed (first 300 chars): cmd " make -j 16 " exited with exit
code 2 and output:
Making all in config
make[1]: Entering directory
`/home/modules/build/OpenMPI/2.1.2/iccifort-2018.1.163-GCC-6.4.0-2.28/openmpi-2.1.2/config'

make[1]: Nothing to be done for `all'.
make[1]: Leaving directory `/home/modules/build/OpenMPI/2.1.2/icc
...

The end of the log file
/tmp/eb-Dnjmix/easybuild-OpenMPI-2.1.2-20180305.144907.SAFtP.log
contains:

libtool:   error: require no space between '-L' and '-lrt'
make[2]: *** [libmca_pml_ucx.la] Error 1
make[2]: Leaving directory
`/home/modules/build/OpenMPI/2.1.2/iccifort-2018.1.163-GCC-6.4.0-2.28/openmpi-2.1.2/ompi/mca/pml/ucx'

make[1]: *** [all-recursive] Error 1

I can't make out what's causing this.  Perhaps the build server has a
library installed triggering an error?

Reply via email to