Dear Yann,

On 26/04/2018 15:49, Yann Sagon wrote:
Dear Mikael, dear Kenneth,

I tried to compile with only with_jemalloc = False and it failed.

Using Mikael's patch, it's working fine! And yes, it's a CentOS 6.9, good guess.

Excellent, thanks for the feedback.

Just to be clear: did you use "with_jemalloc = False" in combination with the patch, or just the patch by itself?

The patch has been included in the just released EasyBuild v3.6.0 btw,
I managed to squeeze that in at the very last minute...


Many thanks, I'm very happy that I won't have to fight to install Bazel and Tensorflow manually anymore!

After spending many hours fighting Bazel myself, big +1 :)


I'm trying to compile the TF with cuda support as well but I'm not sure if I proceed correctly.

I have added in the eb file:

under dependencies: ('cuDNN', '6.0-CUDA-8.0.61', '', True)

and cuda_compute_capabilities = ['6.0']

Do I have to enable cuda support in Bazel as well?

No, the TensorFlow easyblock detects when CUDA/cuDNN have been included as dependencies, and then should do the right thing.


here is the build log when trying to build for cuda: https://gist.github.com/ysagon/a205242bb91becaccf4b684d5285514d

This is the actual error:

Cuda Configuration Error: Cannot find libdevice.10.bc under /opt/ebsofts/Core/CUDA/8.0.61

That doesn't ring a bell for me, but maybe some CUDA drivers are missing?


regards,

Kenneth







2018-04-25 21:45 GMT+02:00 Kenneth Hoste <[email protected] <mailto:[email protected]>>:

    Hi all,

    On 25/04/2018 21:21, Mikael Öhman wrote:
    > Hi Yann,
> > A bit of a shot in the dark here, but does this happen to be a CentOS6 > machine?
    > If so, I had to set "with_jemalloc = False" and apply the lrt-flag patch
    > https://github.com/easybuilders/easybuild-easyconfigs/pull/6089
    <https://github.com/easybuilders/easybuild-easyconfigs/pull/6089>
    (which I
    > hope will make it into the next release? still waiting approval though)

    Hmm, I lost track of that one, I'll see what I can do to squeeze it
    into
    EasyBuild v3.6.0...

> > (if i recall correctly, jemalloc was due to the old kernel missing some > feature (and disabling jemalloc was the easiest fix), and the -lrt flag > was related to some behavior in the linker in combination with Bazel not > passing on necessary link flags) > > Though, I couldn't actually spot any actual error message in the entire > log you pasted.

    There error is pretty clear at the end of the log:

    external/jemalloc/src/pages.c: In function 'je_pages_huge':
    external/jemalloc/src/pages.c:203:30: error: 'MADV_HUGEPAGE' undeclared
    (first use in this function)
        return (madvise(addr, size, MADV_HUGEPAGE) != 0);
                                    ^~~~~~~~~~~~~
    external/jemalloc/src/pages.c:203:30: note: each undeclared identifier
    is reported only once for each function it appears in
    external/jemalloc/src/pages.c: In function 'je_pages_nohuge':
    external/jemalloc/src/pages.c:217:30: error: 'MADV_NOHUGEPAGE'
    undeclared (first use in this function)
        return (madvise(addr, size, MADV_NOHUGEPAGE) != 0);
                                    ^~~~~~~~~~~~~~~

    This does indeed look like you'll need to disable jemalloc support, by
    including this in the easyconfig file:

             with_jemalloc = False

    Please let us know whether that helps, and whether you need to apply
    Mikael's patch as well or not.


    regards,

    Kenneth

> > Best regards, Mikael > > > > On Wed, Apr 25, 2018 at 6:01 PM, Yann Sagon <[email protected] <mailto:[email protected]>
    > <mailto:[email protected] <mailto:[email protected]>>> wrote:
> >     Dear Kenneth, > >     Thanks for the links. I tried with the tf in develop branch, it
    >     still doesn't compile successfuly.
> >     Here is a full log with debug: > > https://gist.github.com/ysagon/30ecfee7789d6cf8f9304103fcdb539f
    <https://gist.github.com/ysagon/30ecfee7789d6cf8f9304103fcdb539f>
    >     <https://gist.github.com/ysagon/30ecfee7789d6cf8f9304103fcdb539f
    <https://gist.github.com/ysagon/30ecfee7789d6cf8f9304103fcdb539f>>
> >     Best > > >     2018-04-25 15:17 GMT+02:00 Kenneth Hoste <[email protected] <mailto:[email protected]>
     >     <mailto:[email protected] <mailto:[email protected]>>>:
     >
     >         Dear Yann,
     >
     >         This does not show the actual error that occurred, can you
     >         provide a
     >         full (debug) log?
     >
     >         Note that we have an easyconfig for TensorFlow 1.7.0 with
     >         foss/2018a as
     >         well in the develop branch of the repository [1], and
    there have
     >         been
     >         some small fixes to the TensorFlow easyblock as well [2].
     >
     >         All this will be included in the upcoming EasyBuild
    release, due
     >         for
     >         later this week.
     >
     >
     >         regards,
     >
     >         Kenneth
     >
     >         [1]
     >
    
https://github.com/easybuilders/easybuild-easyconfigs/tree/develop/easybuild/easyconfigs/t/TensorFlow
    
<https://github.com/easybuilders/easybuild-easyconfigs/tree/develop/easybuild/easyconfigs/t/TensorFlow>
>  <https://github.com/easybuilders/easybuild-easyconfigs/tree/develop/easybuild/easyconfigs/t/TensorFlow <https://github.com/easybuilders/easybuild-easyconfigs/tree/develop/easybuild/easyconfigs/t/TensorFlow>>
     >         [2]
     >
    
https://github.com/easybuilders/easybuild-easyblocks/blob/develop/easybuild/easyblocks/t/tensorflow.py
    
<https://github.com/easybuilders/easybuild-easyblocks/blob/develop/easybuild/easyblocks/t/tensorflow.py>
>  <https://github.com/easybuilders/easybuild-easyblocks/blob/develop/easybuild/easyblocks/t/tensorflow.py <https://github.com/easybuilders/easybuild-easyblocks/blob/develop/easybuild/easyblocks/t/tensorflow.py>>
     >
     >         On 25/04/2018 15:12, Yann Sagon wrote:
     >          > Dear list,
     >          >
     >          > I'm very happy to see that TF is now available in eb as
     >         compiled from
     >          > source, not only the whl.
     >          >
     >          > Unfortunately, I have an error when trying to build:
     >          >
     >          >
     >          > /opt/ebsofts/Core/GCCcore/6.4.0/bin/gcc -U_FORTIFY_SOURCE
     >          > -fstack-protector -Wall
    -B/opt/ebsofts/Core/GCCcore/6.4.0/bin
     >          > -B/opt/ebsofts/Compiler/GCCcore/6.4.0/binutils/2.28/bin
     >          > -Wunused-but-set-parameter -Wno-free-nonheap-object
     >          > -fno-omit-frame-pointer -g0 -O2 '-D_FORTIFY_SOURCE=1'
    -DNDEBUG
     >          > -ffunction-sections -fdata-sections
     >          > -DGEMMLOWP_ALLOW_SLOW_SCALAR_FALLBACK -O2
    '-march=core2' -MD -MF
     >          >
>  bazel-out/k8-py3-opt/bin/external/jpeg/_objs/jpeg/external/jpeg/jcphuff.d
     >         -iquote
     >          > external/jpeg -iquote
     >         bazel-out/k8-py3-opt/genfiles/external/jpeg
     >          > -iquote external/bazel_tools -iquote
     >          > bazel-out/k8-py3-opt/genfiles/external/bazel_tools
    -isystem
     >          > external/bazel_tools/tools/cpp/gcc3 -O3 -w
     >         -fno-canonical-system-headers
     >          > -Wno-builtin-macro-redefined '-D__DATE__="redacted"'
     >          > '-D__TIMESTAMP__="redacted"' '-D__TIME__="redacted"' -c
     >          > external/jpeg/jcphuff.c -o
     >          >
>  bazel-out/k8-py3-opt/bin/external/jpeg/_objs/jpeg/external/jpeg/jcphuff.o)^M
     >          > Target //tensorflow/tools/pip_package:build_pip_package
     >         failed to build
     >          > INFO: Elapsed time: 101.289s, Critical Path: 61.90s^M
     >          > FAILED: Build did NOT complete successfully^M
     >          >   (at easybuild/tools/run.py:481 in parse_cmd_output)
     >          >
     >          > I'm not able to say why it's not working. Any clue?
     >          >
     >          > Best
     >          >
     >
     >
     >


Reply via email to