Hi again, Kenneth.

It turns out that I was wrong about the lack of internet access from the 
compute nodes.  In principle, there should be nothing stopping me from testing 
building with GPUs next week, except for my lack of knowledge :-)

I see this in the easyblock:

    def extra_options():
        extra_vars = {
            # see https://developer.nvidia.com/cuda-gpus
            'cuda_compute_capabilities': [[], "List of CUDA compute 
capabilities to build with", CUSTOM],
            'with_mkl_dnn': [True, "Make TensorFlow use Intel MKL-DNN", CUSTOM],
        }

Does that mean that I can call eb with something like this

eb TensorFlow-1.4.0-foss-2017b-Python-3.6.3.eb -r 
--cuda_compute_capabilities=Tesla 

or something like that (I will not be able to test it until next week).  Or do 
I need to make a new easyconfig which sets that extra option somehow (and 
depends on CUDA and friends)?

Best regards

Jakob



> On 5 Jan 2018, at 16:10, Jakob Schiøtz <[email protected]> wrote:
> 
> 
> 
>> On 5 Jan 2018, at 15:18, Kenneth Hoste <[email protected]> wrote:
>> 
>> On 05/01/2018 14:13, Jakob Schiøtz wrote:
>>> Hi again,
>>> 
>>> Yes, I have overlooked that - I just switched my repo to your branch and 
>>> tried to build :-)
>>> 
>>> Now I get an error when building TensorFlow.  It is a 502 Bad Gateway, 
>>> indicating that some server is down somewhere.  But is it not a problem 
>>> that the build process itself tried to download extra stuff in addition to 
>>> the source files listed in the .eb file?  At least it makes the checksum 
>>> checking moot.
>> 
>> That's indeed a problem, but one that is hard to avoid with TensorFlow, at 
>> least in a first iteration...
>> 
>> Once we're happy with the current approach, a new target could be to get 
>> TensorFlow to build "offline".
>> 
>> One step at a time though... ;-)
> 
> It could be a showstopper for me, though.  On our cluster, only two nodes 
> have GPUs.  With the binary build, I could only install TensorFlow on those, 
> since although CUDA and friends are available on all the nodes, you can only 
> load the resulting TensorFlow module on a machine with a GPU.  Unfortunately, 
> these two nodes are officially compute-nodes, not login-nodes, and that means 
> that they are cut off from the Internet.  So no downloading is possible on 
> these. :-(
> 
> So I have two questions:
> 
> 1. What do we expect to gain by building from source instead of installing 
> from the wheel? 
> 
> 2. Would it be OK to have a “-bin” variant installing from the binary 
> distribution until we get these issues ironed out?
> 
> In my second attempt, I managed to build with foss/2017b (obviously the 
> server was up again).  I have not really tested it yet (I am only just 
> dabbing into TensorFlow and my main application i crashing due to another 
> problem).  Do you want me to submit the new .eb file as a PR to your PR?  Or 
> should I just wait till your stuff has converged?
> 
> /Jakob
> 
> 
>> 
>> 
>> regards,
>> 
>> Kenneth
>>> 
>>> Best regards
>>> 
>>> Jakob
>>> 
>>> 
>>> ............
>>> WARNING: The lower priority option '-c opt' does not override the previous 
>>> value '--compilation_mode=opt'.
>>> WARNING: The lower priority option '-c opt' does not override the previous 
>>> value '--compilation_mode=opt'.
>>> ____Downloading 
>>> https://github.com/bazelbuild/rules_closure/archive/4af89ef1db659eb41f110df189b67d4cf14073e1.tar.gz
>>>  via codeload.github.com: 40,240 bytes
>>> ____Downloading 
>>> https://github.com/bazelbuild/rules_closure/archive/4af89ef1db659eb41f110df189b67d4cf14073e1.tar.gz
>>>  via codeload.github.com: 205,436 bytes
>>> ____Loading package: tensorflow/tools/pip_package
>>> ____Loading package: @bazel_tools//tools/cpp
>>> ____Loading package: @local_jdk//
>>> ____Loading package: @local_config_cc//
>>> ____Loading complete.  Analyzing...
>>> ERROR: 
>>> /home/niflheim/schiotz/easybuild_experimental/sandybridge/build/TensorFlow/1.4.0/foss-2017b-Python-3.6.3/tensorflow-1.4.0/tensorflow/tools/pip_package/BUILD:139:1:
>>>  error loading package 'tensorflow': Encountered error while reading 
>>> extension file 'protobuf.bzl': no such package '@protobuf_archive//': 
>>> java.io.IOException: Error downloading 
>>> [http://mirror.bazel.build/github.com/google/protobuf/archive/b04e5cba356212e4e8c66c61bbe0c3a20537c5b9.tar.gz]
>>>  to 
>>> /tmp/eb-GpWEyg/tmpfJrPWS-bazel-build/external/protobuf_archive/b04e5cba356212e4e8c66c61bbe0c3a20537c5b9.tar.gz:
>>>  GET returned 502 Bad Gateway and referenced by 
>>> '//tensorflow/tools/pip_package:build_pip_package'.
>>> ERROR: 
>>> /home/niflheim/schiotz/easybuild_experimental/sandybridge/build/TensorFlow/1.4.0/foss-2017b-Python-3.6.3/tensorflow-1.4.0/tensorflow/tools/pip_package/BUILD:139:1:
>>>  error loading package 'tensorflow': Encountered error while reading 
>>> extension file 'protobuf.bzl': no such package '@protobuf_archive//': 
>>> java.io.IOException: Error downloading 
>>> [http://mirror.bazel.build/github.com/google/protobuf/archive/b04e5cba356212e4e8c66c61bbe0c3a20537c5b9.tar.gz]
>>>  to 
>>> /tmp/eb-GpWEyg/tmpfJrPWS-bazel-build/external/protobuf_archive/b04e5cba356212e4e8c66c61bbe0c3a20537c5b9.tar.gz:
>>>  GET returned 502 Bad Gateway and referenced by 
>>> '//tensorflow/tools/pip_package:build_pip_package'.
>>> ERROR: Analysis of target 
>>> '//tensorflow/tools/pip_package:build_pip_package' failed; build aborted: 
>>> error loading package 'tensorflow': Encountered error while reading 
>>> extension file 'protobuf.bzl': no such package '@protobuf_archive//': 
>>> java.io.IOException: Error downloading 
>>> [http://mirror.bazel.build/github.com/google/protobuf/archive/b04e5cba356212e4e8c66c61bbe0c3a20537c5b9.tar.gz]
>>>  to 
>>> /tmp/eb-GpWEyg/tmpfJrPWS-bazel-build/external/protobuf_archive/b04e5cba356212e4e8c66c61bbe0c3a20537c5b9.tar.gz:
>>>  GET returned 502 Bad Gateway.
>>> ____Elapsed time: 6.561s
>>> (at easybuild/tools/run.py:481 in parse_cmd_output)
>>> == 2018-01-05 14:07:30,582 easyblock.py:2685 WARNING build failed (first 
>>> 300 chars): cmd "bazel --output_base=/tmp/eb-GpWEyg/tmpfJrPWS-bazel-build 
>>> build --compilation_mode=opt --config=opt --subcommands --verbose_failures  
>>> --config=mkl //tensorflow/tools/pip_package:build_pip_package" exited with 
>>> exit code 1 and output:
>>> ............
>>> 
>>> 
>>>> On 5 Jan 2018, at 13:50, Kenneth Hoste <[email protected]> wrote:
>>>> 
>>>> Hi Jakob,
>>>> 
>>>> On 05/01/2018 13:19, Jakob Schiøtz wrote:
>>>>> Hi Kenneth,
>>>>> 
>>>>> Is it possible that you forgot to check in the patches 
>>>>> TensorFlow-1.4.0_swig-env.patch and TensorFlow-1.4.0_no-enum34.patch in 
>>>>> your PR?  Attempting to build TensorFlow fails because it cannot find 
>>>>> these.
>>>> The patch files are available from 
>>>> https://github.com/easybuilders/easybuild-easyconfigs/pull/5318 (as 
>>>> mentioned in the description of the PR).
>>>> 
>>>> 
>>>> regards,
>>>> 
>>>> Kenneth
>>>>> Best regards
>>>>> 
>>>>> Jakob
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>>> On 4 Jan 2018, at 16:37, Jakob Schiøtz <[email protected]> wrote:
>>>>>> 
>>>>>> Dear Kenneth, Pablo and Maxime,
>>>>>> 
>>>>>> Thanks for your feedback.  Yes, I will try to see if I can build from 
>>>>>> source, but I will focus on the foss toolchain since we use that one for 
>>>>>> our Python here (we do not have the Intel MPI license, and the iomkl 
>>>>>> toolchain could not built Python last time I tried).
>>>>>> 
>>>>>> I assume the reason for building from source is to ensure consistent 
>>>>>> library versions etc.  If that proves very difficult, could we perhaps 
>>>>>> in the interim have builds (with a -bin suffix?) using the prebuilt 
>>>>>> wheels?
>>>>>> 
>>>>>> Best regards
>>>>>> 
>>>>>> Jakob
>>>>>> 
>>>>>> 
>>>>>>> On 4 Jan 2018, at 15:29, Kenneth Hoste <[email protected]> wrote:
>>>>>>> 
>>>>>>> Dear Jakob,
>>>>>>> 
>>>>>>> On 04/01/2018 10:23, Jakob Schiøtz wrote:
>>>>>>>> Hi,
>>>>>>>> 
>>>>>>>> I made a TensorFlow easyconfig a while ago depending on Python with 
>>>>>>>> the foss toolchain; and including a variant with GPU support (PR 
>>>>>>>> 4904).  The latter has not yet been merged, probably because it is 
>>>>>>>> annoying to have something that can only build on a machine with a GPU 
>>>>>>>> (it fails the sanity check otherwise, as TensorFlow with GPU support 
>>>>>>>> cannot load on a machine without it).
>>>>>>> Not being able to test this on a non-GPU system is a bit unfortunate, 
>>>>>>> but that's not a reason that it hasn't been merged yet, that's mostly 
>>>>>>> due to a lack of time from my side to get back to it...
>>>>>>> 
>>>>>>>> Since I made that PR, two newer releases of TensorFlow have appeared 
>>>>>>>> (1.3 and 1.4).   There are easyconfigs for 1.3 with the Intel tool 
>>>>>>>> chain.  I am considering making easyconfigs for TensorFlow 1.4 with 
>>>>>>>> Python-3.6.3-foss-2017b (both with and without GPU support), but first 
>>>>>>>> I would like to know if anybody else is doing this - it is my 
>>>>>>>> impression that somebody who actually know what they are doing may be 
>>>>>>>> working on TensorFlow. :-)
>>>>>>> I have spent quite a bit of time puzzling together an easyblock that 
>>>>>>> supports building TensorFlow from source, see [1].
>>>>>>> 
>>>>>>> It already works for non-GPU installations (see [2] for example), but 
>>>>>>> it's not entirely finished yet because:
>>>>>>> 
>>>>>>> * building from source with CUDA support does not work yet, the build 
>>>>>>> fails with strange Bazel errors...
>>>>>>> 
>>>>>>> * there are some issues when the TensorFlow easyblock is used together 
>>>>>>> with --use-ccache and the Intel compilers;
>>>>>>> because two compiler wrappers are used, they end up calling each other 
>>>>>>> resulting in a "fork bomb" style situation...
>>>>>>> 
>>>>>>> I would really like to get it finished and have easyconfigs available 
>>>>>>> for TensorFlow 1.4 and newer where we properly build TensorFlow from 
>>>>>>> source rather than using the binary wheels...
>>>>>>> 
>>>>>>> Are you up for giving it a try, and maybe helping out with the problems 
>>>>>>> mentioned above?
>>>>>>> 
>>>>>>> 
>>>>>>> regards,
>>>>>>> 
>>>>>>> Kenneth
>>>>>>> 
>>>>>>> 
>>>>>>> [1] https://github.com/easybuilders/easybuild-easyblocks/pull/1287
>>>>>>> [2] https://github.com/easybuilders/easybuild-easyconfigs/pull/5499
>>>>>>> 
>>>>>>>> Best regards
>>>>>>>> 
>>>>>>>> Jakob
>>>>>>>> 
>>>>>>>> --
>>>>>>>> Jakob Schiøtz, professor, Ph.D.
>>>>>>>> Department of Physics
>>>>>>>> Technical University of Denmark
>>>>>>>> DK-2800 Kongens Lyngby, Denmark
>>>>>>>> http://www.fysik.dtu.dk/~schiotz/
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>> --
>>>>>> Jakob Schiøtz, professor, Ph.D.
>>>>>> Department of Physics
>>>>>> Technical University of Denmark
>>>>>> DK-2800 Kongens Lyngby, Denmark
>>>>>> http://www.fysik.dtu.dk/~schiotz/
>>>>>> 
>>>>>> 
>>>>>> 
>>>>> --
>>>>> Jakob Schiøtz, professor, Ph.D.
>>>>> Department of Physics
>>>>> Technical University of Denmark
>>>>> DK-2800 Kongens Lyngby, Denmark
>>>>> http://www.fysik.dtu.dk/~schiotz/
>>>>> 
>>>>> 
>>>>> 
>>> --
>>> Jakob Schiøtz, professor, Ph.D.
>>> Department of Physics
>>> Technical University of Denmark
>>> DK-2800 Kongens Lyngby, Denmark
>>> http://www.fysik.dtu.dk/~schiotz/
>>> 
>>> 
>>> 
>> 
> 
> --
> Jakob Schiøtz, professor, Ph.D.
> Department of Physics
> Technical University of Denmark
> DK-2800 Kongens Lyngby, Denmark
> http://www.fysik.dtu.dk/~schiotz/
> 
> 
> 

--
Jakob Schiøtz, professor, Ph.D.
Department of Physics
Technical University of Denmark
DK-2800 Kongens Lyngby, Denmark
http://www.fysik.dtu.dk/~schiotz/



Reply via email to