leezu commented on pull request #18542:
URL: https://github.com/apache/incubator-mxnet/pull/18542#issuecomment-645790195


   @yzhliu It should be the other way round. Let's open the CI Docker 
container: `docker run -it mxnetci/build.ubuntu_gpu_cu102 /bin/bash` and look 
at the shared libraries in `/usr/local/cuda`:
   
   ```
   root@de49f0e1966c:/work/mxnet# find /usr/local/cuda-10.2 -name "*.so*"
   /usr/local/cuda-10.2/compat/libnvidia-ptxjitcompiler.so.440.33.01
   /usr/local/cuda-10.2/compat/libcuda.so
   /usr/local/cuda-10.2/compat/libcuda.so.1
   /usr/local/cuda-10.2/compat/libcuda.so.440.33.01
   /usr/local/cuda-10.2/compat/libnvidia-fatbinaryloader.so.440.33.01
   /usr/local/cuda-10.2/compat/libnvidia-ptxjitcompiler.so
   /usr/local/cuda-10.2/compat/libnvidia-ptxjitcompiler.so.1
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcudart.so.10.2
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcudart.so.10.2.89
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcupti.so.10.2
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppim.so
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppc.so
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppicc.so
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcurand.so
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnpps.so
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppial.so
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libOpenCL.so
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnvrtc.so
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppist.so
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcuinj64.so
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libOpenCL.so.1.1
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppig.so
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppidei.so
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcusolver.so
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libaccinj64.so.10.2.89
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppicom.so
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libaccinj64.so
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libOpenCL.so.1
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppif.so
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcufftw.so
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libaccinj64.so.10.2
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcusolverMg.so
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcuinj64.so.10.2
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcupti.so
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcusparse.so
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnvgraph.so
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libnppim.so
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libnppc.so
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libnppicc.so
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libcurand.so
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libnpps.so
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libnppial.so
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libnvrtc.so
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libnppist.so
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libcuda.so
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libnvidia-ml.so
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libnppig.so
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libnppidei.so
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libcusolver.so
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libnppicom.so
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libnppif.so
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libcufftw.so
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libcusolverMg.so
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libcusparse.so
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libnvgraph.so
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libnvjpeg.so
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libnppisu.so
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libnppitc.so
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libcufft.so
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnvjpeg.so
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcuinj64.so.10.2.89
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcudart.so
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnvperf_target.so
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppisu.so
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppitc.so
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcupti.so.10.2.75
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnvperf_host.so
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcufft.so
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnvToolsExt.so
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppisu.so.10
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppist.so.10.2.1.89
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnvjpeg.so.10.3.1.89
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppitc.so.10.2.1.89
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcusparse.so.10
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcurand.so.10.1.2.89
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnvrtc.so.10.2.89
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppif.so.10.2.1.89
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnpps.so.10
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppc.so.10
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppial.so.10
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnpps.so.10.2.1.89
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppidei.so.10.2.1.89
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppc.so.10.2.1.89
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppicom.so.10
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnvrtc-builtins.so.10.2
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnvToolsExt.so.1
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcufftw.so.10
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcusolverMg.so.10.3.0.89
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcufftw.so.10.1.2.89
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppicc.so.10
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppicc.so.10.2.1.89
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcurand.so.10
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppicom.so.10.2.1.89
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcusolverMg.so.10
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppial.so.10.2.1.89
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppist.so.10
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcusparse.so.10.3.1.89
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnvrtc-builtins.so.10.2.89
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcusolver.so.10
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnvgraph.so.10
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppim.so.10
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnvToolsExt.so.1.0.0
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnvrtc-builtins.so
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcufft.so.10.1.2.89
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppidei.so.10
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcufft.so.10
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppig.so.10.2.1.89
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnvjpeg.so.10
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnvrtc.so.10.2
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppig.so.10
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcusolver.so.10.3.0.89
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppif.so.10
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppisu.so.10.2.1.89
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppitc.so.10
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppim.so.10.2.1.89
   /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnvgraph.so.10.2.89
   /usr/local/cuda-10.2/nvvm/lib64/libnvvm.so.3.3.0
   /usr/local/cuda-10.2/nvvm/lib64/libnvvm.so
   /usr/local/cuda-10.2/nvvm/lib64/libnvvm.so.3
   /usr/local/cuda-10.2/nvvmx/lib64/libnvvm.so.3.3.0
   /usr/local/cuda-10.2/nvvmx/lib64/libnvvm.so
   /usr/local/cuda-10.2/nvvmx/lib64/libnvvm.so.3
   /usr/local/cuda-10.2/extras/Sanitizer/libsanitizer-public.so
   ```
   
   Because we don't use the nvidia docker command to run the container, only 
`stubs/libcuda.so` is available. If we're on a host with GPUs, we can use 
`docker run --gpus all -it mxnetci/build.ubuntu_gpu_cu102 /bin/bash` and the 
`libcuda.so` from the host as well as the host GPUs will be available inside 
the container. But on a CPU host this just leads to
   
   ```
   docker: Error response from daemon: OCI runtime create failed: 
container_linux.go:349: starting container process caused 
"process_linux.go:449: container init caused \"process_linux.go:432: running 
prestart hook 0 caused \\\"error running hook: exit status 1, stdout: , stderr: 
nvidia-container-cli: initialization error: nvml error: driver not 
loaded\\\\n\\\"\"": unknown.
   ERRO[0000] error waiting for container: context canceled
   ```
   
   The problem is that some part of the tvmop setup currenly requires 
`libcuda.so` to be available (it's listed as shared library dependency of some 
shared library that is opened). We need to check which library is introducing 
the dependency and consider how to fix it. Ideally there shouldn't be a 
dependency on `libcuda.so` as it's only available on GPU hosts.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to