Loris Bennett <[email protected]> writes:

> Hi, 
>
> With EB 4.4.2, My attempt to build
>
>   Keras-2.3.1-fosscuda-2019b-Python-3.7.4.eb
>
> is failing on the dependency
>
>   TensorFlow-2.1.0-fosscuda-2019b-Python-3.7.4
>
> with
>
>   ERROR:
> /trinity/shared/easybuild/build/TensorFlow/2.1.0/fosscuda-2019b-Python-3.7.4/TensorFlow/tensorflow-2.1.0/tensorflow/core/platform/BUILD:313:1:
> C++ compilation of rule '//tensorflow/core/platform:numbers' failed (Exit 1)
>   tensorflow/core/platform/numbers.cc:28:10: fatal error: 
> double-conversion/double-conversion.h: No such file or directory
>
> but earlier in the log I have
>
>   == 2021-09-28 14:19:59,377 modules.py:617 INFO Checking whether 
> double-conversion/3.1.4-GCCcore-8.3.0 exists...
>   == 2021-09-28 14:19:59,377 modules.py:622 INFO Module 
> double-conversion/3.1.4-GCCcore-8.3.0 exists (found in list of available 
> modules)
>   == 2021-09-28 14:19:59,377 modules.py:645 INFO Result for existence check 
> of double-conversion/3.1.4-GCCcore-8.3.0 module: True
>
> Does this ring any bells for anyone?

I tried building TensorFlow directly via

   eb TensorFlow-2.1.0-fosscuda-2019b-Python-3.7.4.eb --robot 
--cuda-compute-capabilities=6.1,7.5

but got the error

  == 2021-09-28 15:32:01,661 filetools.py:1635 INFO Adjusting permissions 
recursively for 
/trinity/shared/easybuild/build/TensorFlow/2.1.0/fosscuda-2019b-Python-3.7.4
  == 2021-09-28 15:32:18,381 build_log.py:169 ERROR EasyBuild crashed with an 
error (at 
easybuild/software/EasyBuild/4.4.2/lib/python2.7/site-packages/easybuild/base/exceptions.py:124
 in __init__): Failed to chmod/chown several paths: 
['/trinity/shared/easybuild/build/TensorFlow/2.1.0/fosscuda-2019b-Python-3.7.4/tmpmA5Ye_-bazel-tf/9b6ce34a74d722f472939a0388a2a554/server/.nfs00000000a98ef9d700000018']
 (last error: [Errno 2] No such file or directory: 
'/trinity/shared/easybuild/build/TensorFlow/2.1.0/fosscuda-2019b-Python-3.7.4/tmpmA5Ye_-bazel-tf/9b6ce34a74d722f472939a0388a2a554/server/.nfs00000000a98ef9d700000018')
 (at 
easybuild/software/EasyBuild/4.4.2/lib/python2.7/site-packages/easybuild/tools/filetools.py:1707
 in adjust_permissions)

This looks like a problem with the NFS share '/trinity/shared', which is
mounted rw on the GPU node I am compiling on and which contains all the
EB stuff. 

With

   eb TensorFlow-2.1.0-fosscuda-2019b-Python-3.7.4.eb --robot 
--cuda-compute-capabilities=6.1,7.5 --buildpath=/dev/shm 
--tmpdir=/scratch/eb-build

I get back to the original error.

Cheers,

Loris

-- 
Dr. Loris Bennett (Hr./Mr.)
ZEDAT, Freie Universität Berlin         Email [email protected]

Reply via email to