Dear Kenneth, Now I have figured out what goes wrong, but not why.
I am running with Python 3.6.3 compiled with foss/2017b. I have two versions of Tensorflow 1.4; the one built from source using your .eb, and a “binary” variant which is installing from the binary package just like my 1.2.1 easyconfig. I am running a cut-down version of a production script made by a student. He did not focus on optimization, as soon as it ran fast enough on his gaming PC at home he concentrated on the science :-) The script is probably not ideal for timing purposes, as it does some on-the-fly generation of the training set, but while running on CPUs only that is insignificant. On a GPU it will probably dominate the runtime, I will address that later. I run it CPU-only on a compute node with 16 CPU cores. For some reason that I do not understand at all, the version of TensorFlow built from source decides to only use two cores (on top I can see the Python process maxing out at 200%), whereas the pre-built version uses the majority of the cores (top shows it maxes out around 900 - 1000%). This is the reason for the discrepancy in run time. I tried adding config = tf.ConfigProto() config.intra_op_parallelism_threads = 16 config.inter_op_parallelism_threads = 16 with tf.Session(config=config) as sess: to the script, but that did not change anything. I have no clue what the problem is. I guess I will just continue my timing on two cores only, and worry about this later… I also tried to use the timing script you recommended, but regardless of which version of TensorFlow I use it crashes with this error: File "/home/niflheim/schiotz/development/benchmarks/scripts/tf_cnn_benchmarks/preprocessing.py", line 23, in <module> from tensorflow.contrib.data.python.ops import interleave_ops ImportError: cannot import name 'interleave_ops’ Best regards Jakob > On 8 Jan 2018, at 21:34, Kenneth Hoste <kenneth.ho...@ugent.be> wrote: > > On 08/01/2018 21:28, Jakob Schiøtz wrote: >>> On 8 Jan 2018, at 20:27, Kenneth Hoste <kenneth.ho...@ugent.be> wrote: >>> >>> On 08/01/2018 15:48, Jakob Schiøtz wrote: >>>> Hi Kenneth, >>>> >>>> I have now tested your TensorFlow 1.4.0 eb on our machines with a >>>> real-world script. It works, but it runs three times slower than with the >>>> prebuild TensorFlow 1.2.1 :-( >>>> >>>> The prebuild version complains that it was build without AVX2 etc, so I do >>>> not really understand why it is so much slower to use the version compiled >>>> from source - assuming of course that there is not a factor three >>>> performance loss between 1.2.1 and 1.4.0; which seems unlikely. >>> Wow, that must be wrong somehow... >>> >>> Is this on the GPU systems? >>> You're not comparing a GPU-enabled TF 1.2 with a CPU-only TF 1.4 built with >>> EB, are you? >>> If you are, then a only factor 3 slower using only CPU is actually quite >>> impressive vs GPU-enabled build. ;-) >> No, I am comparing not-GPU enabled versions running on a machine without a >> GPU. So that is not the problem. >> >> I am running a custom script training one of my students’ model. I agree >> the result is suspicious, and I am rerunning it now (in the queue). >> >> I will try the benchmark you mentioned below as well; and report back - but >> it may be a few days… >> >> By the way, could the difference be due to the compiler (Intel versus foss)? >> That would be an unusually large difference, but my own MD code (ASAP) >> displays almost a factor two difference. > > Which is which? Did you install the binary wheel on top of a Python built > with foss or Intel? > > That could certainly matter, but I would be very surprised if it's more than > 10-20% to be honest. > > I saw 10% performance loss for TF 1.4 built with intel/2017b vs foss/2017b > (on top of Python 3.6.3) on Haswell (so the foss build was slightly faster). > > > regards, > > Kenneth > >> >> Jakob >> >> >>> How are you benchmarking this exactly? >>> When I was trying with the script from >>> https://github.com/tensorflow/benchmarks/tree/master/scripts/tf_cnn_benchmarks, >>> I saw 7x better performance when building TF 1.4.0 from source on Intel >>> Haswell (no GPU) compared to a conda install (which is basically the same >>> as using the binary wheel). >>> On a GPU system (NVIDIA K40) with the TF 1.4.0 binary wheel I saw another >>> 8x performance increase over the EB-installed-from-source CPU-only TF 1.4.0 >>> installation. >>> >>> Here's the command I was running (don't forget the change --device when >>> running on a GPU system): >>> >>> python tf_cnn_benchmarks.py --device cpu --batch_size=32 --model=resnet50 >>> --variable_update=parameter_server --data_format NHWC >>> >>> >>> regards, >>> >>> Kenneth >>> >>>> Best regards >>>> >>>> Jakob >>>> >>>> >>>>> On 5 Jan 2018, at 13:57, Kenneth Hoste <kenneth.ho...@ugent.be> wrote: >>>>> >>>>> On 04/01/2018 16:37, Jakob Schiøtz wrote: >>>>>> Dear Kenneth, Pablo and Maxime, >>>>>> >>>>>> Thanks for your feedback. Yes, I will try to see if I can build from >>>>>> source, but I will focus on the foss toolchain since we use that one for >>>>>> our Python here (we do not have the Intel MPI license, and the iomkl >>>>>> toolchain could not built Python last time I tried). >>>>>> >>>>>> I assume the reason for building from source is to ensure consistent >>>>>> library versions etc. If that proves very difficult, could we perhaps >>>>>> in the interim have builds (with a -bin suffix?) using the prebuilt >>>>>> wheels? >>>>> The main reason for building from source is performance and compatibility >>>>> with the OS. >>>>> >>>>> The binary wheels that are available for TensorFlow are not compatible >>>>> with older OS versions like CentOS 6, as I experienced first-hand when >>>>> trying to get it to work on an older (GPU) system. >>>>> Since the compilation from source with CUDA support didn't work yet, I >>>>> had to resort to injecting a newer glibc version in the 'python' binary, >>>>> which was not fun (well...). >>>>> >>>>> For CPU-only installations, you really have no other option than building >>>>> from source, since the binary wheels were not built with AVX2 >>>>> instructions for example, which leads to large performance losses (some >>>>> quick benchmarking showed a 7x increase in performance for TF 1.4 built >>>>> with foss/2017b over using the binary wheel). >>>>> >>>>> For GPU installations, a similar concern arises, although it may be less >>>>> severe there, depending on what CUDA compute capabilities the binary >>>>> wheels were built with (I only tested the wheels on old systems with >>>>> NVIDIA K20x/K40 GPUs, so there I doubt you'll get much performance >>>>> increase when building from source). >>>>> >>>>> If it turns out to be too difficult or time-consuming to get the build >>>>> from source with CUDA support to work, then we can of course progress >>>>> with sticking to the binary wheel releases for now, I'm not going to >>>>> oppose that. >>>>> >>>>> >>>>> regards, >>>>> >>>>> Kenneth >>>>> >>>>>> Best regards >>>>>> >>>>>> Jakob >>>>>> >>>>>> >>>>>>> On 4 Jan 2018, at 15:29, Kenneth Hoste <kenneth.ho...@ugent.be> wrote: >>>>>>> >>>>>>> Dear Jakob, >>>>>>> >>>>>>> On 04/01/2018 10:23, Jakob Schiøtz wrote: >>>>>>>> Hi, >>>>>>>> >>>>>>>> I made a TensorFlow easyconfig a while ago depending on Python with >>>>>>>> the foss toolchain; and including a variant with GPU support (PR >>>>>>>> 4904). The latter has not yet been merged, probably because it is >>>>>>>> annoying to have something that can only build on a machine with a GPU >>>>>>>> (it fails the sanity check otherwise, as TensorFlow with GPU support >>>>>>>> cannot load on a machine without it). >>>>>>> Not being able to test this on a non-GPU system is a bit unfortunate, >>>>>>> but that's not a reason that it hasn't been merged yet, that's mostly >>>>>>> due to a lack of time from my side to get back to it... >>>>>>> >>>>>>>> Since I made that PR, two newer releases of TensorFlow have appeared >>>>>>>> (1.3 and 1.4). There are easyconfigs for 1.3 with the Intel tool >>>>>>>> chain. I am considering making easyconfigs for TensorFlow 1.4 with >>>>>>>> Python-3.6.3-foss-2017b (both with and without GPU support), but first >>>>>>>> I would like to know if anybody else is doing this - it is my >>>>>>>> impression that somebody who actually know what they are doing may be >>>>>>>> working on TensorFlow. :-) >>>>>>> I have spent quite a bit of time puzzling together an easyblock that >>>>>>> supports building TensorFlow from source, see [1]. >>>>>>> >>>>>>> It already works for non-GPU installations (see [2] for example), but >>>>>>> it's not entirely finished yet because: >>>>>>> >>>>>>> * building from source with CUDA support does not work yet, the build >>>>>>> fails with strange Bazel errors... >>>>>>> >>>>>>> * there are some issues when the TensorFlow easyblock is used together >>>>>>> with --use-ccache and the Intel compilers; >>>>>>> because two compiler wrappers are used, they end up calling each >>>>>>> other resulting in a "fork bomb" style situation... >>>>>>> >>>>>>> I would really like to get it finished and have easyconfigs available >>>>>>> for TensorFlow 1.4 and newer where we properly build TensorFlow from >>>>>>> source rather than using the binary wheels... >>>>>>> >>>>>>> Are you up for giving it a try, and maybe helping out with the problems >>>>>>> mentioned above? >>>>>>> >>>>>>> >>>>>>> regards, >>>>>>> >>>>>>> Kenneth >>>>>>> >>>>>>> >>>>>>> [1] https://github.com/easybuilders/easybuild-easyblocks/pull/1287 >>>>>>> [2] https://github.com/easybuilders/easybuild-easyconfigs/pull/5499 >>>>>>> >>>>>>>> Best regards >>>>>>>> >>>>>>>> Jakob >>>>>>>> >>>>>>>> -- >>>>>>>> Jakob Schiøtz, professor, Ph.D. >>>>>>>> Department of Physics >>>>>>>> Technical University of Denmark >>>>>>>> DK-2800 Kongens Lyngby, Denmark >>>>>>>> http://www.fysik.dtu.dk/~schiotz/ >>>>>>>> >>>>>>>> >>>>>>>> >>>>>> -- >>>>>> Jakob Schiøtz, professor, Ph.D. >>>>>> Department of Physics >>>>>> Technical University of Denmark >>>>>> DK-2800 Kongens Lyngby, Denmark >>>>>> http://www.fysik.dtu.dk/~schiotz/ >>>>>> >>>>>> >>>>>> >>>> -- >>>> Jakob Schiøtz, professor, Ph.D. >>>> Department of Physics >>>> Technical University of Denmark >>>> DK-2800 Kongens Lyngby, Denmark >>>> http://www.fysik.dtu.dk/~schiotz/ >>>> >>>> >>>> >> -- >> Jakob Schiøtz, professor, Ph.D. >> Department of Physics >> Technical University of Denmark >> DK-2800 Kongens Lyngby, Denmark >> http://www.fysik.dtu.dk/~schiotz/ >> >> >> > -- Jakob Schiøtz, professor, Ph.D. Department of Physics Technical University of Denmark DK-2800 Kongens Lyngby, Denmark http://www.fysik.dtu.dk/~schiotz/