> On 8 Jan 2018, at 20:27, Kenneth Hoste <[email protected]> wrote:
> 
> On 08/01/2018 15:48, Jakob Schiøtz wrote:
>> Hi Kenneth,
>> 
>> I have now tested your TensorFlow 1.4.0 eb on our machines with a real-world 
>> script.  It works, but it runs three times slower than with the prebuild 
>> TensorFlow 1.2.1  :-(
>> 
>> The prebuild version complains that it was build without AVX2 etc, so I do 
>> not really understand why it is so much slower to use the version compiled 
>> from source - assuming of course that there is not a factor three 
>> performance loss between 1.2.1 and 1.4.0; which seems unlikely.
> 
> Wow, that must be wrong somehow...
> 
> Is this on the GPU systems?
> You're not comparing a GPU-enabled TF 1.2 with a CPU-only TF 1.4 built with 
> EB, are you?
> If you are, then a only factor 3 slower using only CPU is actually quite 
> impressive vs GPU-enabled build. ;-)

No, I am comparing not-GPU enabled versions running on a machine without a GPU. 
 So that is not the problem.

I am running a custom script training one of my students’ model.  I agree the 
result is suspicious, and I am rerunning it now (in the queue).

I will try the benchmark you mentioned below as well; and report back - but it 
may be a few days…

By the way, could the difference be due to the compiler (Intel versus foss)?  
That would be an unusually large difference, but my own MD code (ASAP) displays 
almost a factor two difference.

Jakob


> 
> How are you benchmarking this exactly?
> When I was trying with the script from 
> https://github.com/tensorflow/benchmarks/tree/master/scripts/tf_cnn_benchmarks,
>  I saw 7x better performance when building TF 1.4.0 from source on Intel 
> Haswell (no GPU) compared to a conda install (which is basically the same as 
> using the binary wheel).
> On a GPU system (NVIDIA K40) with the TF 1.4.0 binary wheel I saw another 8x 
> performance increase over the EB-installed-from-source CPU-only TF 1.4.0 
> installation.
> 
> Here's the command I was running (don't forget the change --device when 
> running on a GPU system):
> 
> python tf_cnn_benchmarks.py --device cpu --batch_size=32 --model=resnet50 
> --variable_update=parameter_server --data_format NHWC
> 
> 
> regards,
> 
> Kenneth
> 
>> 
>> Best regards
>> 
>> Jakob
>> 
>> 
>>> On 5 Jan 2018, at 13:57, Kenneth Hoste <[email protected]> wrote:
>>> 
>>> On 04/01/2018 16:37, Jakob Schiøtz wrote:
>>>> Dear Kenneth, Pablo and Maxime,
>>>> 
>>>> Thanks for your feedback.  Yes, I will try to see if I can build from 
>>>> source, but I will focus on the foss toolchain since we use that one for 
>>>> our Python here (we do not have the Intel MPI license, and the iomkl 
>>>> toolchain could not built Python last time I tried).
>>>> 
>>>> I assume the reason for building from source is to ensure consistent 
>>>> library versions etc.  If that proves very difficult, could we perhaps in 
>>>> the interim have builds (with a -bin suffix?) using the prebuilt wheels?
>>> The main reason for building from source is performance and compatibility 
>>> with the OS.
>>> 
>>> The binary wheels that are available for TensorFlow are not compatible with 
>>> older OS versions like CentOS 6, as I experienced first-hand when trying to 
>>> get it to work on an older (GPU) system.
>>> Since the compilation from source with CUDA support didn't work yet, I had 
>>> to resort to injecting a newer glibc version in the 'python' binary, which 
>>> was not fun (well...).
>>> 
>>> For CPU-only installations, you really have no other option than building 
>>> from source, since the binary wheels were not built with AVX2 instructions 
>>> for example, which leads to large performance losses (some quick 
>>> benchmarking showed a 7x increase in performance for TF 1.4 built with 
>>> foss/2017b over using the binary wheel).
>>> 
>>> For GPU installations, a similar concern arises, although it may be less 
>>> severe there, depending on what CUDA compute capabilities the binary wheels 
>>> were built with (I only tested the wheels on old systems with NVIDIA 
>>> K20x/K40 GPUs, so there I doubt you'll get much performance increase when 
>>> building from source).
>>> 
>>> If it turns out to be too difficult or time-consuming to get the build from 
>>> source with CUDA support to work, then we can of course progress with 
>>> sticking to the binary wheel releases for now, I'm not going to oppose that.
>>> 
>>> 
>>> regards,
>>> 
>>> Kenneth
>>> 
>>>> Best regards
>>>> 
>>>> Jakob
>>>> 
>>>> 
>>>>> On 4 Jan 2018, at 15:29, Kenneth Hoste <[email protected]> wrote:
>>>>> 
>>>>> Dear Jakob,
>>>>> 
>>>>> On 04/01/2018 10:23, Jakob Schiøtz wrote:
>>>>>> Hi,
>>>>>> 
>>>>>> I made a TensorFlow easyconfig a while ago depending on Python with the 
>>>>>> foss toolchain; and including a variant with GPU support (PR 4904).  The 
>>>>>> latter has not yet been merged, probably because it is annoying to have 
>>>>>> something that can only build on a machine with a GPU (it fails the 
>>>>>> sanity check otherwise, as TensorFlow with GPU support cannot load on a 
>>>>>> machine without it).
>>>>> Not being able to test this on a non-GPU system is a bit unfortunate, but 
>>>>> that's not a reason that it hasn't been merged yet, that's mostly due to 
>>>>> a lack of time from my side to get back to it...
>>>>> 
>>>>>> Since I made that PR, two newer releases of TensorFlow have appeared 
>>>>>> (1.3 and 1.4).   There are easyconfigs for 1.3 with the Intel tool 
>>>>>> chain.  I am considering making easyconfigs for TensorFlow 1.4 with 
>>>>>> Python-3.6.3-foss-2017b (both with and without GPU support), but first I 
>>>>>> would like to know if anybody else is doing this - it is my impression 
>>>>>> that somebody who actually know what they are doing may be working on 
>>>>>> TensorFlow. :-)
>>>>> I have spent quite a bit of time puzzling together an easyblock that 
>>>>> supports building TensorFlow from source, see [1].
>>>>> 
>>>>> It already works for non-GPU installations (see [2] for example), but 
>>>>> it's not entirely finished yet because:
>>>>> 
>>>>> * building from source with CUDA support does not work yet, the build 
>>>>> fails with strange Bazel errors...
>>>>> 
>>>>> * there are some issues when the TensorFlow easyblock is used together 
>>>>> with --use-ccache and the Intel compilers;
>>>>>   because two compiler wrappers are used, they end up calling each other 
>>>>> resulting in a "fork bomb" style situation...
>>>>> 
>>>>> I would really like to get it finished and have easyconfigs available for 
>>>>> TensorFlow 1.4 and newer where we properly build TensorFlow from source 
>>>>> rather than using the binary wheels...
>>>>> 
>>>>> Are you up for giving it a try, and maybe helping out with the problems 
>>>>> mentioned above?
>>>>> 
>>>>> 
>>>>> regards,
>>>>> 
>>>>> Kenneth
>>>>> 
>>>>> 
>>>>> [1] https://github.com/easybuilders/easybuild-easyblocks/pull/1287
>>>>> [2] https://github.com/easybuilders/easybuild-easyconfigs/pull/5499
>>>>> 
>>>>>> Best regards
>>>>>> 
>>>>>> Jakob
>>>>>> 
>>>>>> --
>>>>>> Jakob Schiøtz, professor, Ph.D.
>>>>>> Department of Physics
>>>>>> Technical University of Denmark
>>>>>> DK-2800 Kongens Lyngby, Denmark
>>>>>> http://www.fysik.dtu.dk/~schiotz/
>>>>>> 
>>>>>> 
>>>>>> 
>>>> --
>>>> Jakob Schiøtz, professor, Ph.D.
>>>> Department of Physics
>>>> Technical University of Denmark
>>>> DK-2800 Kongens Lyngby, Denmark
>>>> http://www.fysik.dtu.dk/~schiotz/
>>>> 
>>>> 
>>>> 
>> --
>> Jakob Schiøtz, professor, Ph.D.
>> Department of Physics
>> Technical University of Denmark
>> DK-2800 Kongens Lyngby, Denmark
>> http://www.fysik.dtu.dk/~schiotz/
>> 
>> 
>> 
> 

--
Jakob Schiøtz, professor, Ph.D.
Department of Physics
Technical University of Denmark
DK-2800 Kongens Lyngby, Denmark
http://www.fysik.dtu.dk/~schiotz/



Reply via email to