[ https://issues.apache.org/jira/browse/MESOS-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16099717#comment-16099717 ]
Till Toenshoff commented on MESOS-7730: --------------------------------------- [~klueska] which patch is fixing this? > CUDA not working anymore on 1.3.0 > --------------------------------- > > Key: MESOS-7730 > URL: https://issues.apache.org/jira/browse/MESOS-7730 > Project: Mesos > Issue Type: Bug > Components: containerization > Affects Versions: 1.3.0 > Reporter: Adam Cecile > Assignee: Kevin Klues > Fix For: 1.3.1 > > > Hello, > My docker container using CUDA do not detect it anymore. > Here the tensorflow output with 1.2.1: > {noformat} > I0628 12:39:45.505900 16309 exec.cpp:162] Version: 1.2.1 > I0628 12:39:45.508358 16301 exec.cpp:237] Executor registered on agent > 84c99d0b-8551-4f30-a9bc-6c1edbf7c18c-S1 > I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA > library libcublas.so.8.0 locally > I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA > library libcudnn.so.5 locally > I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA > library libcufft.so.8.0 locally > I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA > library libcuda.so.1 locally > I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA > library libcurand.so.8.0 locally > W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library > wasn't compiled to use SSE3 instructions, but these are available on your > machine and could speed up CPU computations. > W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library > wasn't compiled to use SSE4.1 instructions, but these are available on your > machine and could speed up CPU computations. > W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library > wasn't compiled to use SSE4.2 instructions, but these are available on your > machine and could speed up CPU computations. > W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library > wasn't compiled to use AVX instructions, but these are available on your > machine and could speed up CPU computations. > W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library > wasn't compiled to use AVX2 instructions, but these are available on your > machine and could speed up CPU computations. > W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library > wasn't compiled to use FMA instructions, but these are available on your > machine and could speed up CPU computations. > I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with > properties: > name: GeForce GTX 1080 > major: 6 minor: 1 memoryClockRate (GHz) 1.7335 > pciBusID 0000:82:00.0 > Total memory: 7.92GiB > Free memory: 7.81GiB > I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0 > I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0: Y > I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow > device (/gpu:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: > 0000:82:00.0) > {noformat} > And with 1.3.0 > {noformat} > I0628 12:40:30.833947 16854 exec.cpp:162] Version: 1.3.0 > I0628 12:40:30.836612 16845 exec.cpp:237] Executor registered on agent > 84c99d0b-8551-4f30-a9bc-6c1edbf7c18c-S1 > I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA > library libcublas.so.8.0 locally > I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA > library libcudnn.so.5 locally > I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA > library libcufft.so.8.0 locally > I tensorflow/stream_executor/dso_loader.cc:126] Couldn't open CUDA library > libcuda.so.1. LD_LIBRARY_PATH: > I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:165] hostname: > zelda.service.earthlab.lu > I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:189] libcuda reported > version is: Not found: was unable to find libcuda.so DSO loaded into this > program > I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:363] driver version > file contents: """NVRM version: NVIDIA UNIX x86_64 Kernel Module 375.66 Mon > May 1 15:29:16 PDT 2017 > GCC version: gcc version 4.9.2 (Debian 4.9.2-10) > """ > I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:193] kernel reported > version is: 375.66.0 > I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1065] LD_LIBRARY_PATH: > I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1066] failed to find > libcuda.so on this system: Failed precondition: could not dlopen DSO: > libcuda.so.1; dlerror: libcuda.so.1: cannot open shared object file: No such > file or directory > I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA > library libcurand.so.8.0 locally > W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library > wasn't compiled to use SSE3 instructions, but these are available on your > machine and could speed up CPU computations. > W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library > wasn't compiled to use SSE4.1 instructions, but these are available on your > machine and could speed up CPU computations. > W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library > wasn't compiled to use SSE4.2 instructions, but these are available on your > machine and could speed up CPU computations. > W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library > wasn't compiled to use AVX instructions, but these are available on your > machine and could speed up CPU computations. > W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library > wasn't compiled to use AVX2 instructions, but these are available on your > machine and could speed up CPU computations. > W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library > wasn't compiled to use FMA instructions, but these are available on your > machine and could speed up CPU computations. > E tensorflow/stream_executor/cuda/cuda_driver.cc:509] failed call to cuInit: > CUDA_ERROR_NO_DEVICE > I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:158] retrieving CUDA > diagnostic information for host: zelda.service.earthlab.lu > I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:165] hostname: > zelda.service.earthlab.lu > I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:189] libcuda reported > version is: Not found: was unable to find libcuda.so DSO loaded into this > program > I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:363] driver version > file contents: """NVRM version: NVIDIA UNIX x86_64 Kernel Module 375.66 Mon > May 1 15:29:16 PDT 2017 > GCC version: gcc version 4.9.2 (Debian 4.9.2-10) > """ > I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:193] kernel reported > version is: 375.66.0 > {noformat} > All i did is upgrading/downgrading mesos package and restarted the container. > I did the test several time and it's 100% reproductible. > Regards, Adam. -- This message was sent by Atlassian JIRA (v6.4.14#64029)