[GitHub] [incubator-mxnet] cybaj opened a new issue #18960: Dockerfile build with mxnet dependency package no gpus detected

GitBox Tue, 18 Aug 2020 22:15:53 -0700


cybaj opened a new issue #18960:
URL: https://github.com/apache/incubator-mxnet/issues/18960



   ## Description
   (A clear and concise description of what the bug is.)
   
   ### TL;DR
   I want to build image which contains the library needed `mxnet` dependecy.
   So I added installation of the library and `mxnet` at Dockerfile. `mxnet` 
package was installed fine.
   But build was failed with `OSError: libcuda.so.1: cannot open shared object 
file: No such file or directory` at installation of the library.
   So I added `LD_LIBRARY_PATH` too. But in this case, not like before, any 
gpus were detected.
   
   
   ### cuda
   I used `nvcr.io/nvidia/pytorch:19.10-py3` image.
   Which contains below. 
[ref](https://docs.nvidia.com/deeplearning/frameworks/pytorch-release-notes/rel_19-10.html#rel_19-10)
   - NVIDIA CUDA 10.1.243 including cuBLAS 10.2.1.243
   - NVIDIA cuDNN 7.6.4
   So I installed with pypi `mxnet-cu101`.
   And I have checked `libcuda.so.1` exists in `/usr/local/cuda/compat/lib.real`
   
   
   ### Error Message
   (Paste the complete error message. Please also include stack trace by 
setting environment variable `DMLC_LOG_STACK_TRACE_DEPTH=10` before running 
your script.)
   1. Cannot install python package which have mxnet dependency with log below.
   ```
   Step 5/9 : RUN pip install git+https://github.com/cybaj/KoGPT2.git#egg=kogpt2
    ---> Running in 2ede86c70b10
   Collecting kogpt2 from git+https://github.com/cybaj/KoGPT2.git#egg=kogpt2
     Cloning https://github.com/cybaj/KoGPT2.git to 
/tmp/pip-install-kawcrvv2/kogpt2
     Running command git clone -q https://github.com/cybaj/KoGPT2.git 
/tmp/pip-install-kawcrvv2/kogpt2
       ERROR: Command errored out with exit status 1:
        command: /opt/conda/bin/python -c 'import sys, setuptools, tokenize; 
sys.argv[0] = '"'"'/tmp/pip-install-kawcrvv2/kogpt2/setup.py'"'"'; 
__file__='"'"'/tmp/pip-install-kawcrvv2/kogpt2/setup.py'"'"';f=getattr(tokenize,
 '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', 
'"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info 
--egg-base pip-egg-info
            cwd: /tmp/pip-install-kawcrvv2/kogpt2/
       Complete output (21 lines):
       Traceback (most recent call last):
         File "<string>", line 1, in <module>
         File "/tmp/pip-install-kawcrvv2/kogpt2/setup.py", line 1, in <module>
           from kogpt2 import __version__
         File "/tmp/pip-install-kawcrvv2/kogpt2/kogpt2/__init__.py", line 15, 
in <module>
           from . import model
         File "/tmp/pip-install-kawcrvv2/kogpt2/kogpt2/model/__init__.py", line 
17, in <module>
           from .gpt import *
         File "/tmp/pip-install-kawcrvv2/kogpt2/kogpt2/model/gpt.py", line 24, 
in <module>
           import mxnet as mx
         File "/opt/conda/lib/python3.6/site-packages/mxnet/__init__.py", line 
24, in <module>
           from .context import Context, current_context, cpu, gpu, cpu_pinned
         File "/opt/conda/lib/python3.6/site-packages/mxnet/context.py", line 
24, in <module>
           from .base import classproperty, with_metaclass, 
_MXClassPropertyMetaClass
         File "/opt/conda/lib/python3.6/site-packages/mxnet/base.py", line 214, 
in <module>
           _LIB = _load_lib()
         File "/opt/conda/lib/python3.6/site-packages/mxnet/base.py", line 205, 
in _load_lib
           lib = ctypes.CDLL(lib_path[0], ctypes.RTLD_LOCAL)
         File "/opt/conda/lib/python3.6/ctypes/__init__.py", line 348, in 
__init__
           self._handle = _dlopen(self._name, mode)
       OSError: libcuda.so.1: cannot open shared object file: No such file or 
directory
       ----------------------------------------
   ERROR: Command errored out with exit status 1: python setup.py egg_info 
Check the logs for full command output.
   The command '/bin/sh -c pip install 
git+https://github.com/cybaj/KoGPT2.git#egg=kogpt2' returned a non-zero code: 1
   ```
   
   2. If I add `LD_LIBRARY_PATH` like below (which contains `libcuda.so.1`), 
then installation succeeds, but NO gpu detected.
   ```
   Step 9/10 : RUN python -c "import torch; print(torch.__version__); 
print(torch.cuda.device_count());"
    ---> Running in 8273db124d7f
   1.3.0a0+24ae9b5
   0
   Removing intermediate container 8273db124d7f
    ---> a8f922092018
   Step 10/10 : RUN python -c "import mxnet; print(mxnet.__version__); 
print(mxnet.util.get_gpu_count());"
    ---> Running in 59c987a815bc
   1.6.0
   0
   ```
   Without installation python library which need mxnet dep, all gpus dectected.
   
   ## To Reproduce
   Docker build with Dockerfile below.
   ```
   FROM nvcr.io/nvidia/pytorch:19.10-py3
    
   ENV LD_LIBRARY_PATH $LD_LIBRARY_PATH:/usr/local/cuda/compat/lib.real
    
   RUN pip install --no-cache-dir mxnet_cu101
   RUN pip install --no-cache-dir gluonnlp sentencepiece
   RUN pip install git+https://github.com/cybaj/KoGPT2.git#egg=kogpt2 # this 
need mxnet
   RUN pip install transformers==2.11.0
    
   WORKDIR /workspace
    
   RUN python -c "import torch; print(torch.__version__); 
print(torch.cuda.device_count());"
   RUN python -c "import mxnet; print(mxnet.__version__); 
print(mxnet.util.get_gpu_count());"
   ```
   
   
   ### Steps to reproduce
   
   1. Docker build with the Dockerfile
   2. Check gpus detection.
   ```
   RUN python -c "import torch; print(torch.__version__); 
print(torch.cuda.device_count());"
   RUN python -c "import mxnet; print(mxnet.__version__); 
print(mxnet.util.get_gpu_count());"
   ``` 
   
   ## What have you tried to solve it?
   
   1. Use other docker base image. 10.12 version. But failed.
   2. Build within other machine (other docker, which of version is latest), 
but failed.
   
   ## Environment
   
   We recommend using our script for collecting the diagnositc information. Run 
the following command and paste the outputs below:
   ```
   curl --retry 10 -s 
https://raw.githubusercontent.com/dmlc/gluon-nlp/master/tools/diagnose.py | 
python
   
   # paste outputs here
   ```
   404 Not Found that diagnose.py url.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [incubator-mxnet] cybaj opened a new issue #18960: Dockerfile build with mxnet dependency package no gpus detected

Reply via email to