Hello all,

yesterday I had exactly the same problems with jemalloc and the lrt flag on CentOS 6.9, so thanks!

Additionally, I had to add libpng as a dependency because of this:

   bazel-out/host/bin/tensorflow/python/gen_sdca_ops_py_wrappers_cc:
   symbol lookup error:
   
/dev/shm/tmp/eb-JSJtbU/tmpdYLcXD-bazel-build/execroot/org_tensorflow/bazel-out/host/bin/tensorflow/python/../../_solib_k8/_U_S_Stensorflow_Spython_Cgen_Usdca_Uops_Upy_Uwrappers_Ucc___Utensorflow/libtensorflow_framework.so:
   undefined symbol: png_set_longjmp_fn

which seems to need libpng >=1.5.

Cheers,

Miguel


On 26/04/18 21:49, Yann Sagon wrote:
Dear Mikael, dear Kenneth,

I tried to compile with only with_jemalloc = False and it failed.

Using Mikael's patch, it's working fine! And yes, it's a CentOS 6.9, good guess.

Many thanks, I'm very happy that I won't have to fight to install Bazel and Tensorflow manually anymore!

I'm trying to compile the TF with cuda support as well but I'm not sure if I proceed correctly.

I have added in the eb file:

under dependencies: ('cuDNN', '6.0-CUDA-8.0.61', '', True)

and cuda_compute_capabilities = ['6.0']

Do I have to enable cuda support in Bazel as well?

here is the build log when trying to build for cuda: https://gist.github.com/ysagon/a205242bb91becaccf4b684d5285514d






2018-04-25 21:45 GMT+02:00 Kenneth Hoste <[email protected] <mailto:[email protected]>>:

    Hi all,

    On 25/04/2018 21:21, Mikael Öhman wrote:
    > Hi Yann,
    >
    > A bit of a shot in the dark here, but does this happen to be a
    CentOS6
    > machine?
    > If so, I had to set "with_jemalloc = False" and apply the
    lrt-flag patch
    > https://github.com/easybuilders/easybuild-easyconfigs/pull/6089
    <https://github.com/easybuilders/easybuild-easyconfigs/pull/6089>
    (which I
    > hope will make it into the next release? still waiting approval
    though)

    Hmm, I lost track of that one, I'll see what I can do to squeeze
    it into
    EasyBuild v3.6.0...

    >
    > (if i recall correctly, jemalloc was due to the old kernel
    missing some
    > feature (and disabling jemalloc was the easiest fix), and the
    -lrt flag
    > was related to some behavior in the linker in combination with
    Bazel not
    > passing on necessary link flags)
    >
    > Though, I couldn't actually spot any actual error message in the
    entire
    > log you pasted.

    There error is pretty clear at the end of the log:

    external/jemalloc/src/pages.c: In function 'je_pages_huge':
    external/jemalloc/src/pages.c:203:30: error: 'MADV_HUGEPAGE'
    undeclared
    (first use in this function)
       return (madvise(addr, size, MADV_HUGEPAGE) != 0);
                                   ^~~~~~~~~~~~~
    external/jemalloc/src/pages.c:203:30: note: each undeclared
    identifier
    is reported only once for each function it appears in
    external/jemalloc/src/pages.c: In function 'je_pages_nohuge':
    external/jemalloc/src/pages.c:217:30: error: 'MADV_NOHUGEPAGE'
    undeclared (first use in this function)
       return (madvise(addr, size, MADV_NOHUGEPAGE) != 0);
                                   ^~~~~~~~~~~~~~~

    This does indeed look like you'll need to disable jemalloc
    support, by
    including this in the easyconfig file:

            with_jemalloc = False

    Please let us know whether that helps, and whether you need to apply
    Mikael's patch as well or not.


    regards,

    Kenneth

    >
    > Best regards, Mikael
    >
    >
    >
    > On Wed, Apr 25, 2018 at 6:01 PM, Yann Sagon <[email protected]
    <mailto:[email protected]>
    > <mailto:[email protected] <mailto:[email protected]>>> wrote:
    >
    >     Dear Kenneth,
    >
    >     Thanks for the links. I tried with the tf in develop branch, it
    >     still doesn't compile successfuly.
    >
    >     Here is a full log with debug:
    >
    > https://gist.github.com/ysagon/30ecfee7789d6cf8f9304103fcdb539f
    <https://gist.github.com/ysagon/30ecfee7789d6cf8f9304103fcdb539f>
    >   
     <https://gist.github.com/ysagon/30ecfee7789d6cf8f9304103fcdb539f
    <https://gist.github.com/ysagon/30ecfee7789d6cf8f9304103fcdb539f>>
    >
    >     Best
    >
    >
    >     2018-04-25 15:17 GMT+02:00 Kenneth Hoste
    <[email protected] <mailto:[email protected]>
    >     <mailto:[email protected]
    <mailto:[email protected]>>>:
    >
    >         Dear Yann,
    >
    >         This does not show the actual error that occurred, can you
    >         provide a
    >         full (debug) log?
    >
    >         Note that we have an easyconfig for TensorFlow 1.7.0 with
    >         foss/2018a as
    >         well in the develop branch of the repository [1], and
    there have
    >         been
    >         some small fixes to the TensorFlow easyblock as well [2].
    >
    >         All this will be included in the upcoming EasyBuild
    release, due
    >         for
    >         later this week.
    >
    >
    >         regards,
    >
    >         Kenneth
    >
    >         [1]
    >
    
https://github.com/easybuilders/easybuild-easyconfigs/tree/develop/easybuild/easyconfigs/t/TensorFlow
    
<https://github.com/easybuilders/easybuild-easyconfigs/tree/develop/easybuild/easyconfigs/t/TensorFlow>
    >       
     
<https://github.com/easybuilders/easybuild-easyconfigs/tree/develop/easybuild/easyconfigs/t/TensorFlow
    
<https://github.com/easybuilders/easybuild-easyconfigs/tree/develop/easybuild/easyconfigs/t/TensorFlow>>
    >         [2]
    >
    
https://github.com/easybuilders/easybuild-easyblocks/blob/develop/easybuild/easyblocks/t/tensorflow.py
    
<https://github.com/easybuilders/easybuild-easyblocks/blob/develop/easybuild/easyblocks/t/tensorflow.py>
    >       
     
<https://github.com/easybuilders/easybuild-easyblocks/blob/develop/easybuild/easyblocks/t/tensorflow.py
    
<https://github.com/easybuilders/easybuild-easyblocks/blob/develop/easybuild/easyblocks/t/tensorflow.py>>
    >
    >         On 25/04/2018 15:12, Yann Sagon wrote:
    >          > Dear list,
    >          >
    >          > I'm very happy to see that TF is now available in eb as
    >         compiled from
    >          > source, not only the whl.
    >          >
    >          > Unfortunately, I have an error when trying to build:
    >          >
    >          >
    >          > /opt/ebsofts/Core/GCCcore/6.4.0/bin/gcc -U_FORTIFY_SOURCE
    >          > -fstack-protector -Wall
    -B/opt/ebsofts/Core/GCCcore/6.4.0/bin
    >          > -B/opt/ebsofts/Compiler/GCCcore/6.4.0/binutils/2.28/bin
    >          > -Wunused-but-set-parameter -Wno-free-nonheap-object
    >          > -fno-omit-frame-pointer -g0 -O2 '-D_FORTIFY_SOURCE=1'
    -DNDEBUG
    >          > -ffunction-sections -fdata-sections
    >          > -DGEMMLOWP_ALLOW_SLOW_SCALAR_FALLBACK -O2
    '-march=core2' -MD -MF
    >          >
    >       
     bazel-out/k8-py3-opt/bin/external/jpeg/_objs/jpeg/external/jpeg/jcphuff.d
    >         -iquote
    >          > external/jpeg -iquote
    >         bazel-out/k8-py3-opt/genfiles/external/jpeg
    >          > -iquote external/bazel_tools -iquote
    >          > bazel-out/k8-py3-opt/genfiles/external/bazel_tools
    -isystem
    >          > external/bazel_tools/tools/cpp/gcc3 -O3 -w
    >         -fno-canonical-system-headers
    >          > -Wno-builtin-macro-redefined '-D__DATE__="redacted"'
    >          > '-D__TIMESTAMP__="redacted"' '-D__TIME__="redacted"' -c
    >          > external/jpeg/jcphuff.c -o
    >          >
    >       
     
bazel-out/k8-py3-opt/bin/external/jpeg/_objs/jpeg/external/jpeg/jcphuff.o)^M
    >          > Target //tensorflow/tools/pip_package:build_pip_package
    >         failed to build
    >          > INFO: Elapsed time: 101.289s, Critical Path: 61.90s^M
    >          > FAILED: Build did NOT complete successfully^M
    >          >   (at easybuild/tools/run.py:481 in parse_cmd_output)
    >          >
    >          > I'm not able to say why it's not working. Any clue?
    >          >
    >          > Best
    >          >
    >
    >
    >



Reply via email to