Dear Kenneth,

Now I have figured out what goes wrong, but not why.

I am running with Python 3.6.3 compiled with foss/2017b.  I have two versions 
of Tensorflow 1.4; the one built from source using your .eb, and a “binary” 
variant which is installing from the binary package just like my 1.2.1 
easyconfig.

I am running a cut-down version of a production script made by a student.  He 
did not focus on optimization, as soon as it ran fast enough on his gaming PC 
at home he concentrated on the science :-)  The script is probably not ideal 
for timing purposes, as it does some on-the-fly generation of the training set, 
but while running on CPUs only that is insignificant.  On a GPU it will 
probably dominate the runtime, I will address that later.

I run it CPU-only on a compute node with 16 CPU cores.  For some reason that I 
do not understand at all, the version of TensorFlow built from source decides 
to only use two cores (on top I can see the Python process maxing out at 200%), 
whereas the pre-built version uses the majority of the cores (top shows it 
maxes out around 900 - 1000%).  This is the reason for the discrepancy in run 
time.

I tried adding 

    config = tf.ConfigProto()
    config.intra_op_parallelism_threads = 16
    config.inter_op_parallelism_threads = 16

    with tf.Session(config=config) as sess:

to the script, but that did not change anything.  I have no clue what the 
problem is.  I guess I will just continue my timing on two cores only, and 
worry about this later…

I also tried to use the timing script you recommended, but regardless of which 
version of TensorFlow I use it crashes with this error:
File 
"/home/niflheim/schiotz/development/benchmarks/scripts/tf_cnn_benchmarks/preprocessing.py",
 line 23, in <module>
    from tensorflow.contrib.data.python.ops import interleave_ops
ImportError: cannot import name 'interleave_ops’

 
Best regards

Jakob


> On 8 Jan 2018, at 21:34, Kenneth Hoste <kenneth.ho...@ugent.be> wrote:
> 
> On 08/01/2018 21:28, Jakob Schiøtz wrote:
>>> On 8 Jan 2018, at 20:27, Kenneth Hoste <kenneth.ho...@ugent.be> wrote:
>>> 
>>> On 08/01/2018 15:48, Jakob Schiøtz wrote:
>>>> Hi Kenneth,
>>>> 
>>>> I have now tested your TensorFlow 1.4.0 eb on our machines with a 
>>>> real-world script.  It works, but it runs three times slower than with the 
>>>> prebuild TensorFlow 1.2.1  :-(
>>>> 
>>>> The prebuild version complains that it was build without AVX2 etc, so I do 
>>>> not really understand why it is so much slower to use the version compiled 
>>>> from source - assuming of course that there is not a factor three 
>>>> performance loss between 1.2.1 and 1.4.0; which seems unlikely.
>>> Wow, that must be wrong somehow...
>>> 
>>> Is this on the GPU systems?
>>> You're not comparing a GPU-enabled TF 1.2 with a CPU-only TF 1.4 built with 
>>> EB, are you?
>>> If you are, then a only factor 3 slower using only CPU is actually quite 
>>> impressive vs GPU-enabled build. ;-)
>> No, I am comparing not-GPU enabled versions running on a machine without a 
>> GPU.  So that is not the problem.
>> 
>> I am running a custom script training one of my students’ model.  I agree 
>> the result is suspicious, and I am rerunning it now (in the queue).
>> 
>> I will try the benchmark you mentioned below as well; and report back - but 
>> it may be a few days…
>> 
>> By the way, could the difference be due to the compiler (Intel versus foss)? 
>>  That would be an unusually large difference, but my own MD code (ASAP) 
>> displays almost a factor two difference.
> 
> Which is which? Did you install the binary wheel on top of a Python built 
> with foss or Intel?
> 
> That could certainly matter, but I would be very surprised if it's more than 
> 10-20% to be honest.
> 
> I saw 10% performance loss for TF 1.4 built with intel/2017b vs foss/2017b 
> (on top of Python 3.6.3) on Haswell (so the foss build was slightly faster).
> 
> 
> regards,
> 
> Kenneth
> 
>> 
>> Jakob
>> 
>> 
>>> How are you benchmarking this exactly?
>>> When I was trying with the script from 
>>> https://github.com/tensorflow/benchmarks/tree/master/scripts/tf_cnn_benchmarks,
>>>  I saw 7x better performance when building TF 1.4.0 from source on Intel 
>>> Haswell (no GPU) compared to a conda install (which is basically the same 
>>> as using the binary wheel).
>>> On a GPU system (NVIDIA K40) with the TF 1.4.0 binary wheel I saw another 
>>> 8x performance increase over the EB-installed-from-source CPU-only TF 1.4.0 
>>> installation.
>>> 
>>> Here's the command I was running (don't forget the change --device when 
>>> running on a GPU system):
>>> 
>>> python tf_cnn_benchmarks.py --device cpu --batch_size=32 --model=resnet50 
>>> --variable_update=parameter_server --data_format NHWC
>>> 
>>> 
>>> regards,
>>> 
>>> Kenneth
>>> 
>>>> Best regards
>>>> 
>>>> Jakob
>>>> 
>>>> 
>>>>> On 5 Jan 2018, at 13:57, Kenneth Hoste <kenneth.ho...@ugent.be> wrote:
>>>>> 
>>>>> On 04/01/2018 16:37, Jakob Schiøtz wrote:
>>>>>> Dear Kenneth, Pablo and Maxime,
>>>>>> 
>>>>>> Thanks for your feedback.  Yes, I will try to see if I can build from 
>>>>>> source, but I will focus on the foss toolchain since we use that one for 
>>>>>> our Python here (we do not have the Intel MPI license, and the iomkl 
>>>>>> toolchain could not built Python last time I tried).
>>>>>> 
>>>>>> I assume the reason for building from source is to ensure consistent 
>>>>>> library versions etc.  If that proves very difficult, could we perhaps 
>>>>>> in the interim have builds (with a -bin suffix?) using the prebuilt 
>>>>>> wheels?
>>>>> The main reason for building from source is performance and compatibility 
>>>>> with the OS.
>>>>> 
>>>>> The binary wheels that are available for TensorFlow are not compatible 
>>>>> with older OS versions like CentOS 6, as I experienced first-hand when 
>>>>> trying to get it to work on an older (GPU) system.
>>>>> Since the compilation from source with CUDA support didn't work yet, I 
>>>>> had to resort to injecting a newer glibc version in the 'python' binary, 
>>>>> which was not fun (well...).
>>>>> 
>>>>> For CPU-only installations, you really have no other option than building 
>>>>> from source, since the binary wheels were not built with AVX2 
>>>>> instructions for example, which leads to large performance losses (some 
>>>>> quick benchmarking showed a 7x increase in performance for TF 1.4 built 
>>>>> with foss/2017b over using the binary wheel).
>>>>> 
>>>>> For GPU installations, a similar concern arises, although it may be less 
>>>>> severe there, depending on what CUDA compute capabilities the binary 
>>>>> wheels were built with (I only tested the wheels on old systems with 
>>>>> NVIDIA K20x/K40 GPUs, so there I doubt you'll get much performance 
>>>>> increase when building from source).
>>>>> 
>>>>> If it turns out to be too difficult or time-consuming to get the build 
>>>>> from source with CUDA support to work, then we can of course progress 
>>>>> with sticking to the binary wheel releases for now, I'm not going to 
>>>>> oppose that.
>>>>> 
>>>>> 
>>>>> regards,
>>>>> 
>>>>> Kenneth
>>>>> 
>>>>>> Best regards
>>>>>> 
>>>>>> Jakob
>>>>>> 
>>>>>> 
>>>>>>> On 4 Jan 2018, at 15:29, Kenneth Hoste <kenneth.ho...@ugent.be> wrote:
>>>>>>> 
>>>>>>> Dear Jakob,
>>>>>>> 
>>>>>>> On 04/01/2018 10:23, Jakob Schiøtz wrote:
>>>>>>>> Hi,
>>>>>>>> 
>>>>>>>> I made a TensorFlow easyconfig a while ago depending on Python with 
>>>>>>>> the foss toolchain; and including a variant with GPU support (PR 
>>>>>>>> 4904).  The latter has not yet been merged, probably because it is 
>>>>>>>> annoying to have something that can only build on a machine with a GPU 
>>>>>>>> (it fails the sanity check otherwise, as TensorFlow with GPU support 
>>>>>>>> cannot load on a machine without it).
>>>>>>> Not being able to test this on a non-GPU system is a bit unfortunate, 
>>>>>>> but that's not a reason that it hasn't been merged yet, that's mostly 
>>>>>>> due to a lack of time from my side to get back to it...
>>>>>>> 
>>>>>>>> Since I made that PR, two newer releases of TensorFlow have appeared 
>>>>>>>> (1.3 and 1.4).   There are easyconfigs for 1.3 with the Intel tool 
>>>>>>>> chain.  I am considering making easyconfigs for TensorFlow 1.4 with 
>>>>>>>> Python-3.6.3-foss-2017b (both with and without GPU support), but first 
>>>>>>>> I would like to know if anybody else is doing this - it is my 
>>>>>>>> impression that somebody who actually know what they are doing may be 
>>>>>>>> working on TensorFlow. :-)
>>>>>>> I have spent quite a bit of time puzzling together an easyblock that 
>>>>>>> supports building TensorFlow from source, see [1].
>>>>>>> 
>>>>>>> It already works for non-GPU installations (see [2] for example), but 
>>>>>>> it's not entirely finished yet because:
>>>>>>> 
>>>>>>> * building from source with CUDA support does not work yet, the build 
>>>>>>> fails with strange Bazel errors...
>>>>>>> 
>>>>>>> * there are some issues when the TensorFlow easyblock is used together 
>>>>>>> with --use-ccache and the Intel compilers;
>>>>>>>   because two compiler wrappers are used, they end up calling each 
>>>>>>> other resulting in a "fork bomb" style situation...
>>>>>>> 
>>>>>>> I would really like to get it finished and have easyconfigs available 
>>>>>>> for TensorFlow 1.4 and newer where we properly build TensorFlow from 
>>>>>>> source rather than using the binary wheels...
>>>>>>> 
>>>>>>> Are you up for giving it a try, and maybe helping out with the problems 
>>>>>>> mentioned above?
>>>>>>> 
>>>>>>> 
>>>>>>> regards,
>>>>>>> 
>>>>>>> Kenneth
>>>>>>> 
>>>>>>> 
>>>>>>> [1] https://github.com/easybuilders/easybuild-easyblocks/pull/1287
>>>>>>> [2] https://github.com/easybuilders/easybuild-easyconfigs/pull/5499
>>>>>>> 
>>>>>>>> Best regards
>>>>>>>> 
>>>>>>>> Jakob
>>>>>>>> 
>>>>>>>> --
>>>>>>>> Jakob Schiøtz, professor, Ph.D.
>>>>>>>> Department of Physics
>>>>>>>> Technical University of Denmark
>>>>>>>> DK-2800 Kongens Lyngby, Denmark
>>>>>>>> http://www.fysik.dtu.dk/~schiotz/
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>> --
>>>>>> Jakob Schiøtz, professor, Ph.D.
>>>>>> Department of Physics
>>>>>> Technical University of Denmark
>>>>>> DK-2800 Kongens Lyngby, Denmark
>>>>>> http://www.fysik.dtu.dk/~schiotz/
>>>>>> 
>>>>>> 
>>>>>> 
>>>> --
>>>> Jakob Schiøtz, professor, Ph.D.
>>>> Department of Physics
>>>> Technical University of Denmark
>>>> DK-2800 Kongens Lyngby, Denmark
>>>> http://www.fysik.dtu.dk/~schiotz/
>>>> 
>>>> 
>>>> 
>> --
>> Jakob Schiøtz, professor, Ph.D.
>> Department of Physics
>> Technical University of Denmark
>> DK-2800 Kongens Lyngby, Denmark
>> http://www.fysik.dtu.dk/~schiotz/
>> 
>> 
>> 
> 

--
Jakob Schiøtz, professor, Ph.D.
Department of Physics
Technical University of Denmark
DK-2800 Kongens Lyngby, Denmark
http://www.fysik.dtu.dk/~schiotz/



Reply via email to