Hello all,
yesterday I had exactly the same problems with jemalloc and the lrt flag
on CentOS 6.9, so thanks!
Additionally, I had to add libpng as a dependency because of this:
bazel-out/host/bin/tensorflow/python/gen_sdca_ops_py_wrappers_cc:
symbol lookup error:
/dev/shm/tmp/eb-JSJtbU/tmpdYLcXD-bazel-build/execroot/org_tensorflow/bazel-out/host/bin/tensorflow/python/../../_solib_k8/_U_S_Stensorflow_Spython_Cgen_Usdca_Uops_Upy_Uwrappers_Ucc___Utensorflow/libtensorflow_framework.so:
undefined symbol: png_set_longjmp_fn
which seems to need libpng >=1.5.
Cheers,
Miguel
On 26/04/18 21:49, Yann Sagon wrote:
Dear Mikael, dear Kenneth,
I tried to compile with only with_jemalloc = False and it failed.
Using Mikael's patch, it's working fine! And yes, it's a CentOS 6.9,
good guess.
Many thanks, I'm very happy that I won't have to fight to install
Bazel and Tensorflow manually anymore!
I'm trying to compile the TF with cuda support as well but I'm not
sure if I proceed correctly.
I have added in the eb file:
under dependencies: ('cuDNN', '6.0-CUDA-8.0.61', '', True)
and cuda_compute_capabilities = ['6.0']
Do I have to enable cuda support in Bazel as well?
here is the build log when trying to build for cuda:
https://gist.github.com/ysagon/a205242bb91becaccf4b684d5285514d
2018-04-25 21:45 GMT+02:00 Kenneth Hoste <[email protected]
<mailto:[email protected]>>:
Hi all,
On 25/04/2018 21:21, Mikael Öhman wrote:
> Hi Yann,
>
> A bit of a shot in the dark here, but does this happen to be a
CentOS6
> machine?
> If so, I had to set "with_jemalloc = False" and apply the
lrt-flag patch
> https://github.com/easybuilders/easybuild-easyconfigs/pull/6089
<https://github.com/easybuilders/easybuild-easyconfigs/pull/6089>
(which I
> hope will make it into the next release? still waiting approval
though)
Hmm, I lost track of that one, I'll see what I can do to squeeze
it into
EasyBuild v3.6.0...
>
> (if i recall correctly, jemalloc was due to the old kernel
missing some
> feature (and disabling jemalloc was the easiest fix), and the
-lrt flag
> was related to some behavior in the linker in combination with
Bazel not
> passing on necessary link flags)
>
> Though, I couldn't actually spot any actual error message in the
entire
> log you pasted.
There error is pretty clear at the end of the log:
external/jemalloc/src/pages.c: In function 'je_pages_huge':
external/jemalloc/src/pages.c:203:30: error: 'MADV_HUGEPAGE'
undeclared
(first use in this function)
return (madvise(addr, size, MADV_HUGEPAGE) != 0);
^~~~~~~~~~~~~
external/jemalloc/src/pages.c:203:30: note: each undeclared
identifier
is reported only once for each function it appears in
external/jemalloc/src/pages.c: In function 'je_pages_nohuge':
external/jemalloc/src/pages.c:217:30: error: 'MADV_NOHUGEPAGE'
undeclared (first use in this function)
return (madvise(addr, size, MADV_NOHUGEPAGE) != 0);
^~~~~~~~~~~~~~~
This does indeed look like you'll need to disable jemalloc
support, by
including this in the easyconfig file:
with_jemalloc = False
Please let us know whether that helps, and whether you need to apply
Mikael's patch as well or not.
regards,
Kenneth
>
> Best regards, Mikael
>
>
>
> On Wed, Apr 25, 2018 at 6:01 PM, Yann Sagon <[email protected]
<mailto:[email protected]>
> <mailto:[email protected] <mailto:[email protected]>>> wrote:
>
> Dear Kenneth,
>
> Thanks for the links. I tried with the tf in develop branch, it
> still doesn't compile successfuly.
>
> Here is a full log with debug:
>
> https://gist.github.com/ysagon/30ecfee7789d6cf8f9304103fcdb539f
<https://gist.github.com/ysagon/30ecfee7789d6cf8f9304103fcdb539f>
>
<https://gist.github.com/ysagon/30ecfee7789d6cf8f9304103fcdb539f
<https://gist.github.com/ysagon/30ecfee7789d6cf8f9304103fcdb539f>>
>
> Best
>
>
> 2018-04-25 15:17 GMT+02:00 Kenneth Hoste
<[email protected] <mailto:[email protected]>
> <mailto:[email protected]
<mailto:[email protected]>>>:
>
> Dear Yann,
>
> This does not show the actual error that occurred, can you
> provide a
> full (debug) log?
>
> Note that we have an easyconfig for TensorFlow 1.7.0 with
> foss/2018a as
> well in the develop branch of the repository [1], and
there have
> been
> some small fixes to the TensorFlow easyblock as well [2].
>
> All this will be included in the upcoming EasyBuild
release, due
> for
> later this week.
>
>
> regards,
>
> Kenneth
>
> [1]
>
https://github.com/easybuilders/easybuild-easyconfigs/tree/develop/easybuild/easyconfigs/t/TensorFlow
<https://github.com/easybuilders/easybuild-easyconfigs/tree/develop/easybuild/easyconfigs/t/TensorFlow>
>
<https://github.com/easybuilders/easybuild-easyconfigs/tree/develop/easybuild/easyconfigs/t/TensorFlow
<https://github.com/easybuilders/easybuild-easyconfigs/tree/develop/easybuild/easyconfigs/t/TensorFlow>>
> [2]
>
https://github.com/easybuilders/easybuild-easyblocks/blob/develop/easybuild/easyblocks/t/tensorflow.py
<https://github.com/easybuilders/easybuild-easyblocks/blob/develop/easybuild/easyblocks/t/tensorflow.py>
>
<https://github.com/easybuilders/easybuild-easyblocks/blob/develop/easybuild/easyblocks/t/tensorflow.py
<https://github.com/easybuilders/easybuild-easyblocks/blob/develop/easybuild/easyblocks/t/tensorflow.py>>
>
> On 25/04/2018 15:12, Yann Sagon wrote:
> > Dear list,
> >
> > I'm very happy to see that TF is now available in eb as
> compiled from
> > source, not only the whl.
> >
> > Unfortunately, I have an error when trying to build:
> >
> >
> > /opt/ebsofts/Core/GCCcore/6.4.0/bin/gcc -U_FORTIFY_SOURCE
> > -fstack-protector -Wall
-B/opt/ebsofts/Core/GCCcore/6.4.0/bin
> > -B/opt/ebsofts/Compiler/GCCcore/6.4.0/binutils/2.28/bin
> > -Wunused-but-set-parameter -Wno-free-nonheap-object
> > -fno-omit-frame-pointer -g0 -O2 '-D_FORTIFY_SOURCE=1'
-DNDEBUG
> > -ffunction-sections -fdata-sections
> > -DGEMMLOWP_ALLOW_SLOW_SCALAR_FALLBACK -O2
'-march=core2' -MD -MF
> >
>
bazel-out/k8-py3-opt/bin/external/jpeg/_objs/jpeg/external/jpeg/jcphuff.d
> -iquote
> > external/jpeg -iquote
> bazel-out/k8-py3-opt/genfiles/external/jpeg
> > -iquote external/bazel_tools -iquote
> > bazel-out/k8-py3-opt/genfiles/external/bazel_tools
-isystem
> > external/bazel_tools/tools/cpp/gcc3 -O3 -w
> -fno-canonical-system-headers
> > -Wno-builtin-macro-redefined '-D__DATE__="redacted"'
> > '-D__TIMESTAMP__="redacted"' '-D__TIME__="redacted"' -c
> > external/jpeg/jcphuff.c -o
> >
>
bazel-out/k8-py3-opt/bin/external/jpeg/_objs/jpeg/external/jpeg/jcphuff.o)^M
> > Target //tensorflow/tools/pip_package:build_pip_package
> failed to build
> > INFO: Elapsed time: 101.289s, Critical Path: 61.90s^M
> > FAILED: Build did NOT complete successfully^M
> > (at easybuild/tools/run.py:481 in parse_cmd_output)
> >
> > I'm not able to say why it's not working. Any clue?
> >
> > Best
> >
>
>
>