On 5/27/21 9:48 AM, Alexander Grund wrote:
The EB log file reports an error:
//tensorflow/core/common_runtime:graph_constructor_test FAILED TO BUILD
and the log file ends with:
Executed 137 out of 814 tests: 137 tests pass, 1 fails to build and 676
were skipped.
FAILED: Build did NOT complete successfully
This is a build failure, so something we should fix or at least find the
cause.
Please check the log, there should be something about why/how it failed to
compile. Just search for the name and scroll a bit around. If you attach
it, I can also take a look.
The EB log file is 205 MB, so it's hard to share :-(
I have this environment:
export EASYBUILD_BUILDPATH=/run/user/$UID/eb_build
ulimit -s 2000240
export EASYBUILD_TMPDIR=/scratch/$USER
and there is quite a bit of space available:
$ df -h /run/user/$UID/eb_build /scratch
Filesystem Size Used Avail Use% Mounted on
tmpfs 19G 19G 30M 100% /run/user/983
/dev/mapper/VolGroup00-lv_scratch 850G 675M 849G 1% /scratch
Searching for FAIL in the log file, I noticed this section:
== 2021-05-26 15:20:28,456 tensorflow.py:899 INFO Starting cpu test
== 2021-05-26 15:20:28,457 run.py:225 INFO running cmd: bazel
--output_user_root=/run/user/983/eb_build/TensorFlow/2.4.1/fosscuda-2020b/tmpkYJDaH-bazel-tf
--host_jvm_args=-Xms512m --host_jvm_args=-Xmx4096m test --config=noaws
--config=nogcp --config=nohd
fs --compilation_mode=opt --config=opt --subcommands --verbose_failures
--jobs=64 --copt="-fPIC"
--action_env=CPATH='/home/modules/software/cURL/7.72.0-GCCcore-10.2.0/include:/home/modules/software/double-conversion/3.1.5-GCCcore-10.2.0/include:/home/modu
les/software/flatbuffers/1.12.0-GCCcore-10.2.0/include:/home/modules/software/giflib/5.2.1-GCCcore-10.2.0/include:/home/modules/software/hwloc/2.2.0-GCCcore-10.2.0/include:/home/modules/software/ICU/67.1-GCCcore-10.2.0/include:/home/modules/software/JsonC
pp/1.9.4-GCCcore-10.2.0/include:/home/modules/software/libjpeg-turbo/2.0.5-GCCcore-10.2.0/include:/home/modules/software/libpng/1.6.37-GCCcore-10.2.0/include:/home/modules/software/LMDB/0.9.24-GCCcore-10.2.0/include:/home/modules/software/nsync/1.24.0-GCC
core-10.2.0/include:/home/modules/software/PCRE/8.44-GCCcore-10.2.0/include:/home/modules/software/protobuf/3.14.0-GCCcore-10.2.0/include:/home/modules/software/pybind11/2.6.0-GCCcore-10.2.0/include:/home/modules/software/snappy/1.1.8-GCCcore-10.2.0/inclu
de:/home/modules/software/SQLite/3.33.0-GCCcore-10.2.0/include:/home/modules/software/zlib/1.2.11-GCCcore-10.2.0/include'
--action_env=LIBRARY_PATH='/home/modules/software/cURL/7.72.0-GCCcore-10.2.0/lib:/home/modules/software/double-conversion/3.1.5-GCCco
re-10.2.0/lib:/home/modules/software/flatbuffers/1.12.0-GCCcore-10.2.0/lib:/home/modules/software/giflib/5.2.1-GCCcore-10.2.0/lib:/home/modules/software/hwloc/2.2.0-GCCcore-10.2.0/lib:/home/modules/software/ICU/67.1-GCCcore-10.2.0/lib:/home/modules/softwa
re/JsonCpp/1.9.4-GCCcore-10.2.0/lib:/home/modules/software/libjpeg-turbo/2.0.5-GCCcore-10.2.0/lib64:/home/modules/software/libpng/1.6.37-GCCcore-10.2.0/lib:/home/modules/software/LMDB/0.9.24-GCCcore-10.2.0/lib:/home/modules/software/nsync/1.24.0-GCCcore-1
0.2.0/lib:/home/modules/software/PCRE/8.44-GCCcore-10.2.0/lib:/home/modules/software/protobuf/3.14.0-GCCcore-10.2.0/lib:/home/modules/software/pybind11/2.6.0-GCCcore-10.2.0/lib:/home/modules/software/snappy/1.1.8-GCCcore-10.2.0/lib:/home/modules/software/
SQLite/3.33.0-GCCcore-10.2.0/lib:/home/modules/software/zlib/1.2.11-GCCcore-10.2.0/lib'
--action_env=PYTHONPATH --action_env=PYTHONNOUSERSITE=1
--distinct_host_configuration=false --config=mkl --test_output=errors
--build_tests_only --local_test_jobs=64 -
-test_tag_filters='-gpu,-tpu,-no_cuda_on_cpu_tap,-no_pip,-no_oss,-oss_serial,-benchmark-test,-v1only'
--build_tag_filters='-gpu,-tpu,-no_cuda_on_cpu_tap,-no_pip,-no_oss,-oss_serial,-benchmark-test,-v1only'
--test_env=CUDA_VISIBLE_DEVICES='-1' --test_timeo
ut=3600 --test_size_filters=small -- //tensorflow/core/...
-//tensorflow/core:example_java_proto
-//tensorflow/core/example:example_protos_closure //tensorflow/cc/...
//tensorflow/c/... //tensorflow/python/...
-//tensorflow/core/profiler/internal/gpu:devi
ce_tracer_test -//tensorflow/c/eager:c_api_test_gpu
-//tensorflow/c/eager:c_api_distributed_test
-//tensorflow/c/eager:c_api_distributed_test_gpu
-//tensorflow/c/eager:c_api_cluster_test_gpu
-//tensorflow/c/eager:c_api_remote_function_test_gpu -//tensorfl
ow/c/eager:c_api_remote_test_gpu
-//tensorflow/core/kernels:sparse_matmul_op_test
-//tensorflow/core/kernels:sparse_matmul_op_test_gpu
-//tensorflow/core/common_runtime:collective_param_resolver_local_test
-//tensorflow/core/common_runtime:mkl_layout_pass
_test -//tensorflow/core/kernels/mkl:mkl_fused_ops_test
== 2021-05-26 15:30:49,144 run.py:595 INFO parse_log_for_error msg:
Command used: bazel
--output_user_root=/run/user/983/eb_build/TensorFlow/2.4.1/fosscuda-2020b/tmpkYJDaH-bazel-tf
--host_jvm_args=-Xms512m --host_jvm_args=-Xmx4096m test --config=noaws --
config=nogcp --config=nohdfs --compilation_mode=opt --config=opt
--subcommands --verbose_failures --jobs=64 --copt="-fPIC"
--action_env=CPATH='/home/modules/software/cURL/7.72.0-GCCcore-10.2.0/include:/home/modules/software/double-conversion/3.1.5-GCCcore
-10.2.0/include:/home/modules/software/flatbuffers/1.12.0-GCCcore-10.2.0/include:/home/modules/software/giflib/5.2.1-GCCcore-10.2.0/include:/home/modules/software/hwloc/2.2.0-GCCcore-10.2.0/include:/home/modules/software/ICU/67.1-GCCcore-10.2.0/include:/h
ome/modules/software/JsonCpp/1.9.4-GCCcore-10.2.0/include:/home/modules/software/libjpeg-turbo/2.0.5-GCCcore-10.2.0/include:/home/modules/software/libpng/1.6.37-GCCcore-10.2.0/include:/home/modules/software/LMDB/0.9.24-GCCcore-10.2.0/include:/home/modules
/software/nsync/1.24.0-GCCcore-10.2.0/include:/home/modules/software/PCRE/8.44-GCCcore-10.2.0/include:/home/modules/software/protobuf/3.14.0-GCCcore-10.2.0/include:/home/modules/software/pybind11/2.6.0-GCCcore-10.2.0/include:/home/modules/software/snappy/
1.1.8-GCCcore-10.2.0/include:/home/modules/software/SQLite/3.33.0-GCCcore-10.2.0/include:/home/modules/software/zlib/1.2.11-GCCcore-10.2.0/include'
--action_env=LIBRARY_PATH='/home/modules/software/cURL/7.72.0-GCCcore-10.2.0/lib:/home/modules/software/dou
ble-conversion/3.1.5-GCCcore-10.2.0/lib:/home/modules/software/flatbuffers/1.12.0-GCCcore-10.2.0/lib:/home/modules/software/giflib/5.2.1-GCCcore-10.2.0/lib:/home/modules/software/hwloc/2.2.0-GCCcore-10.2.0/lib:/home/modules/software/ICU/67.1-GCCcore-10.2.
0/lib:/home/modules/software/JsonCpp/1.9.4-GCCcore-10.2.0/lib:/home/modules/software/libjpeg-turbo/2.0.5-GCCcore-10.2.0/lib64:/home/modules/software/libpng/1.6.37-GCCcore-10.2.0/lib:/home/modules/software/LMDB/0.9.24-GCCcore-10.2.0/lib:/home/modules/softw
are/nsync/1.24.0-GCCcore-10.2.0/lib:/home/modules/software/PCRE/8.44-GCCcore-10.2.0/lib:/home/modules/software/protobuf/3.14.0-GCCcore-10.2.0/lib:/home/modules/software/pybind11/2.6.0-GCCcore-10.2.0/lib:/home/modules/software/snappy/1.1.8-GCCcore-10.2.0/l
ib:/home/modules/software/SQLite/3.33.0-GCCcore-10.2.0/lib:/home/modules/software/zlib/1.2.11-GCCcore-10.2.0/lib'
--action_env=PYTHONPATH --action_env=PYTHONNOUSERSITE=1
--distinct_host_configuration=false --config=mkl --test_output=errors
--build_tests_o
nly --local_test_jobs=64
--test_tag_filters='-gpu,-tpu,-no_cuda_on_cpu_tap,-no_pip,-no_oss,-oss_serial,-benchmark-test,-v1only'
--build_tag_filters='-gpu,-tpu,-no_cuda_on_cpu_tap,-no_pip,-no_oss,-oss_serial,-benchmark-test,-v1only'
--test_env=CUDA_VISIBLE
_DEVICES='-1' --test_timeout=3600 --test_size_filters=small --
//tensorflow/core/... -//tensorflow/core:example_java_proto
-//tensorflow/core/example:example_protos_closure //tensorflow/cc/...
//tensorflow/c/... //tensorflow/python/... -//tensorflow/core/
profiler/internal/gpu:device_tracer_test
-//tensorflow/c/eager:c_api_test_gpu
-//tensorflow/c/eager:c_api_distributed_test
-//tensorflow/c/eager:c_api_distributed_test_gpu
-//tensorflow/c/eager:c_api_cluster_test_gpu
-//tensorflow/c/eager:c_api_remote_fun
ction_test_gpu -//tensorflow/c/eager:c_api_remote_test_gpu
-//tensorflow/core/kernels:sparse_matmul_op_test
-//tensorflow/core/kernels:sparse_matmul_op_test_gpu
-//tensorflow/core/common_runtime:collective_param_resolver_local_test
-//tensorflow/core/comm
on_runtime:mkl_layout_pass_test
-//tensorflow/core/kernels/mkl:mkl_fused_ops_test
== 2021-05-26 15:30:49,145 run.py:597 INFO parse_log_for_error (some may
be harmless) regExp (?<![(,-]|\w)(?:error|segmentation
fault|failed)(?![(,-]|\.?\w) found:
WARNING: Download from
https://storage.googleapis.com/mirror.tensorflow.org/github.com/llvm/llvm-project/archive/f402e682d0ef5598eeffc9a21a691b03e602ff58.tar.gz
failed: class
com.google.devtools.build.lib.bazel.repository.downloader.UnrecoverableHttpExcep
tion GET returned 404 Not Found
SUBCOMMAND: # //tensorflow/core/platform:error [action 'Linking
tensorflow/core/platform/liberror.so', configuration:
f6bc5b6107d950b9fac2186352cdfdfe45c6815016e3edc9f32af940b50d30a6,
execution platform: @local_execution_config_platform//:platform]
SUBCOMMAND: # //tensorflow/core/platform:error [action 'Compiling
tensorflow/core/platform/error.cc', configuration:
f6bc5b6107d950b9fac2186352cdfdfe45c6815016e3edc9f32af940b50d30a6,
execution platform: @local_execution_config_platform//:platform]
external/local_config_cuda/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc
-MD -MF bazel-out/k8-opt/bin/tensorflow/core/platform/_objs/error/error.d
'-frandom-seed=bazel-out/k8-opt/bin/tensorflow/core/platform/_objs/error/error.o'
-DEIGEN_MPL2_O
NLY '-DEIGEN_MAX_ALIGN_BYTES=64' '-DEIGEN_HAS_TYPE_TRAITS=0'
-D__CLANG_SUPPORT_DYN_ANNOTATION__ -iquote . -iquote bazel-out/k8-opt/bin
-iquote external/eigen_archive -iquote
bazel-out/k8-opt/bin/external/eigen_archive -iquote
external/com_google_absl -iqu
ote bazel-out/k8-opt/bin/external/com_google_absl -iquote external/nsync
-iquote bazel-out/k8-opt/bin/external/nsync -iquote
external/double_conversion -iquote
bazel-out/k8-opt/bin/external/double_conversion -iquote
external/com_google_protobuf -iquote ba
zel-out/k8-opt/bin/external/com_google_protobuf -isystem
third_party/eigen3/mkl_include -isystem
bazel-out/k8-opt/bin/third_party/eigen3/mkl_include -isystem
external/eigen_archive -isystem
bazel-out/k8-opt/bin/external/eigen_archive -Wno-builtin-macro-re
defined '-D__DATE__="redacted"' '-D__TIMESTAMP__="redacted"'
'-D__TIME__="redacted"' -fPIE -U_FORTIFY_SOURCE '-D_FORTIFY_SOURCE=1'
-fstack-protector -Wall -fno-omit-frame-pointer -no-canonical-prefixes
-fno-canonical-system-headers -DNDEBUG -g0 -O2 -ffunc
tion-sections -fdata-sections -w -DAUTOLOAD_DYNAMIC_KERNELS -O2
-ftree-vectorize '-march=native' -fno-math-errno -fPIC -fPIC '-std=c++14'
-c tensorflow/core/platform/error.cc -o
bazel-out/k8-opt/bin/tensorflow/core/platform/_objs/error/error.o)
SUBCOMMAND: # //tensorflow/core/platform:error [action 'Linking
tensorflow/core/platform/liberror.a', configuration:
f6bc5b6107d950b9fac2186352cdfdfe45c6815016e3edc9f32af940b50d30a6,
execution platform: @local_execution_config_platform//:platform]
ERROR:
/run/user/983/eb_build/TensorFlow/2.4.1/fosscuda-2020b/TensorFlow/tensorflow-2.4.1/tensorflow/core/common_runtime/BUILD:2700:11:
Linking of rule '//tensorflow/core/common_runtime:graph_constructor_test'
failed (Exit 1): crosstool_wrapper_driver_is_
not_gcc failed: error executing command
/home/modules/software/binutils/2.35-GCCcore-10.2.0/bin/ld.gold: fatal
error:
bazel-out/k8-opt/bin/tensorflow/core/common_runtime/graph_constructor_test:
No space left on device
collect2: error: ld returned 1 exit status
FAILED: Build did NOT complete successfully
//tensorflow/core/common_runtime:graph_constructor_test FAILED TO
BUILD
FAILED: Build did NOT complete successfully
== 2021-05-26 15:30:49,145 run.py:554 WARNING Found 11 errors in command
output (output: WARNING: Download from
https://storage.googleapis.com/mirror.tensorflow.org/github.com/llvm/llvm-project/archive/f402e682d0ef5598eeffc9a21a691b03e602ff58.tar.gz
failed: class
com.google.devtools.build.lib.bazel.repository.downloader.UnrecoverableHttpException
GET returned 404 Not Found
SUBCOMMAND: # //tensorflow/core/platform:error [action 'Linking
tensorflow/core/platform/liberror.so', configuration:
f6bc5b6107d950b9fac2186352cdfdfe45c6815016e3edc9f32af940b50d30a6,
execution platform: @local_execution_config_platform//:platform]
SUBCOMMAND: # //tensorflow/core/platform:error [action 'Compiling
tensorflow/core/platform/error.cc', configuration:
f6bc5b6107d950b9fac2186352cdfdfe45c6815016e3edc9f32af940b50d30a6,
execution platform: @local_execution_config_platform//:platform]
external/local_config_cuda/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc
-MD -MF bazel-out/k8-opt/bin/tensorflow/core/platform/_objs/error/error.d
'-frandom-seed=bazel-out/k8-opt/bin/tensorflow/core/platform/_objs/error/error.o'
-DEIGEN_MPL2_ONLY '-DEIGEN_MAX_ALIGN_BYTES=64' '-DEIGEN_HAS_TYPE_TRAITS=0'
-D__CLANG_SUPPORT_DYN_ANNOTATION__ -iquote . -iquote bazel-out/k8-opt/bin
-iquote external/eigen_archive -iquote
bazel-out/k8-opt/bin/external/eigen_archive -iquote
external/com_google_absl -iquote
bazel-out/k8-opt/bin/external/com_google_absl -iquote external/nsync
-iquote bazel-out/k8-opt/bin/external/nsync -iquote
external/double_conversion -iquote
bazel-out/k8-opt/bin/external/double_conversion -iquote
external/com_google_protobuf -iquote
bazel-out/k8-opt/bin/external/com_google_protobuf -isystem
third_party/eigen3/mkl_include -isystem
bazel-out/k8-opt/bin/third_party/eigen3/mkl_include -isystem
external/eigen_archive -isystem
bazel-out/k8-opt/bin/external/eigen_archive -Wno-builtin-macro-redefined
'-D__DATE__="redacted"' '-D__TIMESTAMP__="redacted"'
'-D__TIME__="redacted"' -fPIE -U_FORTIFY_SOURCE '-D_FORTIFY_SOURCE=1'
-fstack-protector -Wall -fno-omit-frame-pointer -no-canonical-prefixes
-fno-canonical-system-headers -DNDEBUG -g0 -O2 -ffunction-sections
-fdata-sections -w -DAUTOLOAD_DYNAMIC_KERNELS -O2 -ftree-vectorize
'-march=native' -fno-math-errno -fPIC -fPIC '-std=c++14' -c
tensorflow/core/platform/error.cc -o
bazel-out/k8-opt/bin/tensorflow/core/platform/_objs/error/error.o)
SUBCOMMAND: # //tensorflow/core/platform:error [action 'Linking
tensorflow/core/platform/liberror.a', configuration:
f6bc5b6107d950b9fac2186352cdfdfe45c6815016e3edc9f32af940b50d30a6,
execution platform: @local_execution_config_platform//:platform]
ERROR:
/run/user/983/eb_build/TensorFlow/2.4.1/fosscuda-2020b/TensorFlow/tensorflow-2.4.1/tensorflow/core/common_runtime/BUILD:2700:11:
Linking of rule '//tensorflow/core/common_runtime:graph_constructor_test'
failed (Exit 1): crosstool_wrapper_driver_is_not_gcc failed: error
executing command
/home/modules/software/binutils/2.35-GCCcore-10.2.0/bin/ld.gold:
fatal error:
bazel-out/k8-opt/bin/tensorflow/core/common_runtime/graph_constructor_test:
No space left on device
collect2: error: ld returned 1 exit status
FAILED: Build did NOT complete successfully
//tensorflow/core/common_runtime:graph_constructor_test
FAILED TO BUILD
FAILED: Build did NOT complete successfully)
Please note these two errors:
WARNING: Download from
https://storage.googleapis.com/mirror.tensorflow.org/github.com/llvm/llvm-project/archive/f402e682d0ef5598eeffc9a21a691b03e602ff58.tar.gz
failed: class
com.google.devtools.build.lib.bazel.repository.downloader.UnrecoverableHttpExcep
tion GET returned 404 Not Found
Is the URL outdated?
/home/modules/software/binutils/2.35-GCCcore-10.2.0/bin/ld.gold: fatal error:
bazel-out/k8-opt/bin/tensorflow/core/common_runtime/graph_constructor_test: No
space left on device
What device might that be? As shown above, I have quite a bit of disk
space. Is /tmp being used and getting full?
I'd also suggest to join Slack as discussions there are potentially faster.
I'll take a look - are there instructions for Slack?
Thanks,
Ole
--
Ole Holm Nielsen
PhD, Senior HPC Officer
Department of Physics, Technical University of Denmark,
Fysikvej Building 309, DK-2800 Kongens Lyngby, Denmark
E-mail: [email protected]
Homepage: http://dcwww.fysik.dtu.dk/~ohnielse/
Mobile: (+45) 5180 1620