> On 5 Jan 2018, at 15:18, Kenneth Hoste <kenneth.ho...@ugent.be> wrote: > > On 05/01/2018 14:13, Jakob Schiøtz wrote: >> Hi again, >> >> Yes, I have overlooked that - I just switched my repo to your branch and >> tried to build :-) >> >> Now I get an error when building TensorFlow. It is a 502 Bad Gateway, >> indicating that some server is down somewhere. But is it not a problem that >> the build process itself tried to download extra stuff in addition to the >> source files listed in the .eb file? At least it makes the checksum >> checking moot. > > That's indeed a problem, but one that is hard to avoid with TensorFlow, at > least in a first iteration... > > Once we're happy with the current approach, a new target could be to get > TensorFlow to build "offline". > > One step at a time though... ;-)
It could be a showstopper for me, though. On our cluster, only two nodes have GPUs. With the binary build, I could only install TensorFlow on those, since although CUDA and friends are available on all the nodes, you can only load the resulting TensorFlow module on a machine with a GPU. Unfortunately, these two nodes are officially compute-nodes, not login-nodes, and that means that they are cut off from the Internet. So no downloading is possible on these. :-( So I have two questions: 1. What do we expect to gain by building from source instead of installing from the wheel? 2. Would it be OK to have a “-bin” variant installing from the binary distribution until we get these issues ironed out? In my second attempt, I managed to build with foss/2017b (obviously the server was up again). I have not really tested it yet (I am only just dabbing into TensorFlow and my main application i crashing due to another problem). Do you want me to submit the new .eb file as a PR to your PR? Or should I just wait till your stuff has converged? /Jakob > > > regards, > > Kenneth >> >> Best regards >> >> Jakob >> >> >> ............ >> WARNING: The lower priority option '-c opt' does not override the previous >> value '--compilation_mode=opt'. >> WARNING: The lower priority option '-c opt' does not override the previous >> value '--compilation_mode=opt'. >> ____Downloading >> https://github.com/bazelbuild/rules_closure/archive/4af89ef1db659eb41f110df189b67d4cf14073e1.tar.gz >> via codeload.github.com: 40,240 bytes >> ____Downloading >> https://github.com/bazelbuild/rules_closure/archive/4af89ef1db659eb41f110df189b67d4cf14073e1.tar.gz >> via codeload.github.com: 205,436 bytes >> ____Loading package: tensorflow/tools/pip_package >> ____Loading package: @bazel_tools//tools/cpp >> ____Loading package: @local_jdk// >> ____Loading package: @local_config_cc// >> ____Loading complete. Analyzing... >> ERROR: >> /home/niflheim/schiotz/easybuild_experimental/sandybridge/build/TensorFlow/1.4.0/foss-2017b-Python-3.6.3/tensorflow-1.4.0/tensorflow/tools/pip_package/BUILD:139:1: >> error loading package 'tensorflow': Encountered error while reading >> extension file 'protobuf.bzl': no such package '@protobuf_archive//': >> java.io.IOException: Error downloading >> [http://mirror.bazel.build/github.com/google/protobuf/archive/b04e5cba356212e4e8c66c61bbe0c3a20537c5b9.tar.gz] >> to >> /tmp/eb-GpWEyg/tmpfJrPWS-bazel-build/external/protobuf_archive/b04e5cba356212e4e8c66c61bbe0c3a20537c5b9.tar.gz: >> GET returned 502 Bad Gateway and referenced by >> '//tensorflow/tools/pip_package:build_pip_package'. >> ERROR: >> /home/niflheim/schiotz/easybuild_experimental/sandybridge/build/TensorFlow/1.4.0/foss-2017b-Python-3.6.3/tensorflow-1.4.0/tensorflow/tools/pip_package/BUILD:139:1: >> error loading package 'tensorflow': Encountered error while reading >> extension file 'protobuf.bzl': no such package '@protobuf_archive//': >> java.io.IOException: Error downloading >> [http://mirror.bazel.build/github.com/google/protobuf/archive/b04e5cba356212e4e8c66c61bbe0c3a20537c5b9.tar.gz] >> to >> /tmp/eb-GpWEyg/tmpfJrPWS-bazel-build/external/protobuf_archive/b04e5cba356212e4e8c66c61bbe0c3a20537c5b9.tar.gz: >> GET returned 502 Bad Gateway and referenced by >> '//tensorflow/tools/pip_package:build_pip_package'. >> ERROR: Analysis of target '//tensorflow/tools/pip_package:build_pip_package' >> failed; build aborted: error loading package 'tensorflow': Encountered error >> while reading extension file 'protobuf.bzl': no such package >> '@protobuf_archive//': java.io.IOException: Error downloading >> [http://mirror.bazel.build/github.com/google/protobuf/archive/b04e5cba356212e4e8c66c61bbe0c3a20537c5b9.tar.gz] >> to >> /tmp/eb-GpWEyg/tmpfJrPWS-bazel-build/external/protobuf_archive/b04e5cba356212e4e8c66c61bbe0c3a20537c5b9.tar.gz: >> GET returned 502 Bad Gateway. >> ____Elapsed time: 6.561s >> (at easybuild/tools/run.py:481 in parse_cmd_output) >> == 2018-01-05 14:07:30,582 easyblock.py:2685 WARNING build failed (first 300 >> chars): cmd "bazel --output_base=/tmp/eb-GpWEyg/tmpfJrPWS-bazel-build build >> --compilation_mode=opt --config=opt --subcommands --verbose_failures >> --config=mkl //tensorflow/tools/pip_package:build_pip_package" exited with >> exit code 1 and output: >> ............ >> >> >>> On 5 Jan 2018, at 13:50, Kenneth Hoste <kenneth.ho...@ugent.be> wrote: >>> >>> Hi Jakob, >>> >>> On 05/01/2018 13:19, Jakob Schiøtz wrote: >>>> Hi Kenneth, >>>> >>>> Is it possible that you forgot to check in the patches >>>> TensorFlow-1.4.0_swig-env.patch and TensorFlow-1.4.0_no-enum34.patch in >>>> your PR? Attempting to build TensorFlow fails because it cannot find >>>> these. >>> The patch files are available from >>> https://github.com/easybuilders/easybuild-easyconfigs/pull/5318 (as >>> mentioned in the description of the PR). >>> >>> >>> regards, >>> >>> Kenneth >>>> Best regards >>>> >>>> Jakob >>>> >>>> >>>> >>>> >>>>> On 4 Jan 2018, at 16:37, Jakob Schiøtz <schi...@fysik.dtu.dk> wrote: >>>>> >>>>> Dear Kenneth, Pablo and Maxime, >>>>> >>>>> Thanks for your feedback. Yes, I will try to see if I can build from >>>>> source, but I will focus on the foss toolchain since we use that one for >>>>> our Python here (we do not have the Intel MPI license, and the iomkl >>>>> toolchain could not built Python last time I tried). >>>>> >>>>> I assume the reason for building from source is to ensure consistent >>>>> library versions etc. If that proves very difficult, could we perhaps in >>>>> the interim have builds (with a -bin suffix?) using the prebuilt wheels? >>>>> >>>>> Best regards >>>>> >>>>> Jakob >>>>> >>>>> >>>>>> On 4 Jan 2018, at 15:29, Kenneth Hoste <kenneth.ho...@ugent.be> wrote: >>>>>> >>>>>> Dear Jakob, >>>>>> >>>>>> On 04/01/2018 10:23, Jakob Schiøtz wrote: >>>>>>> Hi, >>>>>>> >>>>>>> I made a TensorFlow easyconfig a while ago depending on Python with the >>>>>>> foss toolchain; and including a variant with GPU support (PR 4904). >>>>>>> The latter has not yet been merged, probably because it is annoying to >>>>>>> have something that can only build on a machine with a GPU (it fails >>>>>>> the sanity check otherwise, as TensorFlow with GPU support cannot load >>>>>>> on a machine without it). >>>>>> Not being able to test this on a non-GPU system is a bit unfortunate, >>>>>> but that's not a reason that it hasn't been merged yet, that's mostly >>>>>> due to a lack of time from my side to get back to it... >>>>>> >>>>>>> Since I made that PR, two newer releases of TensorFlow have appeared >>>>>>> (1.3 and 1.4). There are easyconfigs for 1.3 with the Intel tool >>>>>>> chain. I am considering making easyconfigs for TensorFlow 1.4 with >>>>>>> Python-3.6.3-foss-2017b (both with and without GPU support), but first >>>>>>> I would like to know if anybody else is doing this - it is my >>>>>>> impression that somebody who actually know what they are doing may be >>>>>>> working on TensorFlow. :-) >>>>>> I have spent quite a bit of time puzzling together an easyblock that >>>>>> supports building TensorFlow from source, see [1]. >>>>>> >>>>>> It already works for non-GPU installations (see [2] for example), but >>>>>> it's not entirely finished yet because: >>>>>> >>>>>> * building from source with CUDA support does not work yet, the build >>>>>> fails with strange Bazel errors... >>>>>> >>>>>> * there are some issues when the TensorFlow easyblock is used together >>>>>> with --use-ccache and the Intel compilers; >>>>>> because two compiler wrappers are used, they end up calling each other >>>>>> resulting in a "fork bomb" style situation... >>>>>> >>>>>> I would really like to get it finished and have easyconfigs available >>>>>> for TensorFlow 1.4 and newer where we properly build TensorFlow from >>>>>> source rather than using the binary wheels... >>>>>> >>>>>> Are you up for giving it a try, and maybe helping out with the problems >>>>>> mentioned above? >>>>>> >>>>>> >>>>>> regards, >>>>>> >>>>>> Kenneth >>>>>> >>>>>> >>>>>> [1] https://github.com/easybuilders/easybuild-easyblocks/pull/1287 >>>>>> [2] https://github.com/easybuilders/easybuild-easyconfigs/pull/5499 >>>>>> >>>>>>> Best regards >>>>>>> >>>>>>> Jakob >>>>>>> >>>>>>> -- >>>>>>> Jakob Schiøtz, professor, Ph.D. >>>>>>> Department of Physics >>>>>>> Technical University of Denmark >>>>>>> DK-2800 Kongens Lyngby, Denmark >>>>>>> http://www.fysik.dtu.dk/~schiotz/ >>>>>>> >>>>>>> >>>>>>> >>>>> -- >>>>> Jakob Schiøtz, professor, Ph.D. >>>>> Department of Physics >>>>> Technical University of Denmark >>>>> DK-2800 Kongens Lyngby, Denmark >>>>> http://www.fysik.dtu.dk/~schiotz/ >>>>> >>>>> >>>>> >>>> -- >>>> Jakob Schiøtz, professor, Ph.D. >>>> Department of Physics >>>> Technical University of Denmark >>>> DK-2800 Kongens Lyngby, Denmark >>>> http://www.fysik.dtu.dk/~schiotz/ >>>> >>>> >>>> >> -- >> Jakob Schiøtz, professor, Ph.D. >> Department of Physics >> Technical University of Denmark >> DK-2800 Kongens Lyngby, Denmark >> http://www.fysik.dtu.dk/~schiotz/ >> >> >> > -- Jakob Schiøtz, professor, Ph.D. Department of Physics Technical University of Denmark DK-2800 Kongens Lyngby, Denmark http://www.fysik.dtu.dk/~schiotz/