Dear Mikael, dear Kenneth,
I tried to compile with only with_jemalloc = False and it failed.
Using Mikael's patch, it's working fine! And yes, it's a CentOS 6.9, good
guess.
Many thanks, I'm very happy that I won't have to fight to install Bazel and
Tensorflow manually anymore!
I'm trying to compile the TF with cuda support as well but I'm not sure if
I proceed correctly.
I have added in the eb file:
under dependencies: ('cuDNN', '6.0-CUDA-8.0.61', '', True)
and cuda_compute_capabilities = ['6.0']
Do I have to enable cuda support in Bazel as well?
here is the build log when trying to build for cuda:
https://gist.github.com/ysagon/a205242bb91becaccf4b684d5285514d
2018-04-25 21:45 GMT+02:00 Kenneth Hoste <[email protected]>:
> Hi all,
>
> On 25/04/2018 21:21, Mikael Öhman wrote:
> > Hi Yann,
> >
> > A bit of a shot in the dark here, but does this happen to be a CentOS6
> > machine?
> > If so, I had to set "with_jemalloc = False" and apply the lrt-flag patch
> > https://github.com/easybuilders/easybuild-easyconfigs/pull/6089 (which
> I
> > hope will make it into the next release? still waiting approval though)
>
> Hmm, I lost track of that one, I'll see what I can do to squeeze it into
> EasyBuild v3.6.0...
>
> >
> > (if i recall correctly, jemalloc was due to the old kernel missing some
> > feature (and disabling jemalloc was the easiest fix), and the -lrt flag
> > was related to some behavior in the linker in combination with Bazel not
> > passing on necessary link flags)
> >
> > Though, I couldn't actually spot any actual error message in the entire
> > log you pasted.
>
> There error is pretty clear at the end of the log:
>
> external/jemalloc/src/pages.c: In function 'je_pages_huge':
> external/jemalloc/src/pages.c:203:30: error: 'MADV_HUGEPAGE' undeclared
> (first use in this function)
> return (madvise(addr, size, MADV_HUGEPAGE) != 0);
> ^~~~~~~~~~~~~
> external/jemalloc/src/pages.c:203:30: note: each undeclared identifier
> is reported only once for each function it appears in
> external/jemalloc/src/pages.c: In function 'je_pages_nohuge':
> external/jemalloc/src/pages.c:217:30: error: 'MADV_NOHUGEPAGE'
> undeclared (first use in this function)
> return (madvise(addr, size, MADV_NOHUGEPAGE) != 0);
> ^~~~~~~~~~~~~~~
>
> This does indeed look like you'll need to disable jemalloc support, by
> including this in the easyconfig file:
>
> with_jemalloc = False
>
> Please let us know whether that helps, and whether you need to apply
> Mikael's patch as well or not.
>
>
> regards,
>
> Kenneth
>
> >
> > Best regards, Mikael
> >
> >
> >
> > On Wed, Apr 25, 2018 at 6:01 PM, Yann Sagon <[email protected]
> > <mailto:[email protected]>> wrote:
> >
> > Dear Kenneth,
> >
> > Thanks for the links. I tried with the tf in develop branch, it
> > still doesn't compile successfuly.
> >
> > Here is a full log with debug:
> >
> > https://gist.github.com/ysagon/30ecfee7789d6cf8f9304103fcdb539f
> > <https://gist.github.com/ysagon/30ecfee7789d6cf8f9304103fcdb539f>
> >
> > Best
> >
> >
> > 2018-04-25 15:17 GMT+02:00 Kenneth Hoste <[email protected]
> > <mailto:[email protected]>>:
> >
> > Dear Yann,
> >
> > This does not show the actual error that occurred, can you
> > provide a
> > full (debug) log?
> >
> > Note that we have an easyconfig for TensorFlow 1.7.0 with
> > foss/2018a as
> > well in the develop branch of the repository [1], and there have
> > been
> > some small fixes to the TensorFlow easyblock as well [2].
> >
> > All this will be included in the upcoming EasyBuild release, due
> > for
> > later this week.
> >
> >
> > regards,
> >
> > Kenneth
> >
> > [1]
> > https://github.com/easybuilders/easybuild-
> easyconfigs/tree/develop/easybuild/easyconfigs/t/TensorFlow
> > <https://github.com/easybuilders/easybuild-
> easyconfigs/tree/develop/easybuild/easyconfigs/t/TensorFlow>
> > [2]
> > https://github.com/easybuilders/easybuild-
> easyblocks/blob/develop/easybuild/easyblocks/t/tensorflow.py
> > <https://github.com/easybuilders/easybuild-
> easyblocks/blob/develop/easybuild/easyblocks/t/tensorflow.py>
> >
> > On 25/04/2018 15:12, Yann Sagon wrote:
> > > Dear list,
> > >
> > > I'm very happy to see that TF is now available in eb as
> > compiled from
> > > source, not only the whl.
> > >
> > > Unfortunately, I have an error when trying to build:
> > >
> > >
> > > /opt/ebsofts/Core/GCCcore/6.4.0/bin/gcc -U_FORTIFY_SOURCE
> > > -fstack-protector -Wall -B/opt/ebsofts/Core/GCCcore/6.4.0/bin
> > > -B/opt/ebsofts/Compiler/GCCcore/6.4.0/binutils/2.28/bin
> > > -Wunused-but-set-parameter -Wno-free-nonheap-object
> > > -fno-omit-frame-pointer -g0 -O2 '-D_FORTIFY_SOURCE=1' -DNDEBUG
> > > -ffunction-sections -fdata-sections
> > > -DGEMMLOWP_ALLOW_SLOW_SCALAR_FALLBACK -O2 '-march=core2' -MD
> -MF
> > >
> > bazel-out/k8-py3-opt/bin/external/jpeg/_objs/jpeg/
> external/jpeg/jcphuff.d
> > -iquote
> > > external/jpeg -iquote
> > bazel-out/k8-py3-opt/genfiles/external/jpeg
> > > -iquote external/bazel_tools -iquote
> > > bazel-out/k8-py3-opt/genfiles/external/bazel_tools -isystem
> > > external/bazel_tools/tools/cpp/gcc3 -O3 -w
> > -fno-canonical-system-headers
> > > -Wno-builtin-macro-redefined '-D__DATE__="redacted"'
> > > '-D__TIMESTAMP__="redacted"' '-D__TIME__="redacted"' -c
> > > external/jpeg/jcphuff.c -o
> > >
> > bazel-out/k8-py3-opt/bin/external/jpeg/_objs/jpeg/
> external/jpeg/jcphuff.o)^M
> > > Target //tensorflow/tools/pip_package:build_pip_package
> > failed to build
> > > INFO: Elapsed time: 101.289s, Critical Path: 61.90s^M
> > > FAILED: Build did NOT complete successfully^M
> > > (at easybuild/tools/run.py:481 in parse_cmd_output)
> > >
> > > I'm not able to say why it's not working. Any clue?
> > >
> > > Best
> > >
> >
> >
> >
>