larroy opened a new issue #16950: Failure on docker run for ubuntu_build_cuda
URL: https://github.com/apache/incubator-mxnet/issues/16950
 
 
   ## Description
   
   There seems to be a problem with nvidia docker in EC2 when running in recent 
18.04 environment.
   
   ```
   time ci/build.py --no-cache --nvidiadocker --platform ubuntu_build_cuda 
/work/runtime_functions.sh build_ubuntu_gpu_cuda101_cudnn7
   ```
   
   
   ```
   
   
   Traceback (most recent call last):
     File 
"/home/piotr/.local/lib/python3.6/site-packages/docker/api/client.py", line 
261, in _raise_for_status
       response.raise_for_status()
     File "/home/piotr/.local/lib/python3.6/site-packages/requests/models.py", 
line 940, in raise_for_status
       raise HTTPError(http_error_msg, response=self)
   requests.exceptions.HTTPError: 500 Server Error: Internal Server Error for 
url: 
http+docker://localhost/v1.35/containers/4fafc8918ee1a6c4897cfc5f107cd1aad43b1cfbc9ecc2e438b9d4681c3e9824/start
   
   During handling of the above exception, another exception occurred:
   
   Traceback (most recent call last):
     File "ci/build.py", line 454, in <module>
       sys.exit(main())
     File "ci/build.py", line 378, in main
       local_ccache_dir=args.ccache_dir, environment=environment)
     File "ci/build.py", line 248, in container_run
       environment=environment)
     File "/home/piotr/mxnet/ci/safe_docker_run.py", line 123, in run
       container = 
self._add_container(self._docker_client.containers.run(*args, **kwargs))
     File 
"/home/piotr/.local/lib/python3.6/site-packages/docker/models/containers.py", 
line 809, in run
       container.start()
     File 
"/home/piotr/.local/lib/python3.6/site-packages/docker/models/containers.py", 
line 400, in start
       return self.client.api.start(self.id, **kwargs)
     File 
"/home/piotr/.local/lib/python3.6/site-packages/docker/utils/decorators.py", 
line 19, in wrapped
       return f(self, resource_id, *args, **kwargs)
     File 
"/home/piotr/.local/lib/python3.6/site-packages/docker/api/container.py", line 
1095, in start
       self._raise_for_status(res)
     File 
"/home/piotr/.local/lib/python3.6/site-packages/docker/api/client.py", line 
263, in _raise_for_status
       raise create_api_error_from_http_exception(e)
     File "/home/piotr/.local/lib/python3.6/site-packages/docker/errors.py", 
line 31, in create_api_error_from_http_exception
       raise cls(e, response=response, explanation=explanation)
   docker.errors.APIError: 500 Server Error: Internal Server Error ("OCI 
runtime create failed: container_linux.go:346: starting container process 
caused "process_linux.go:449: container init caused \"process_linux.go:432: 
running prestart hook 1 caused \\\"error running hook: exit status 1, stdout: , 
stderr: nvidia-container-cli: mount error: file creation failed: 
/var/lib/docker/overlay2/1e6555a578e7365e7439fbfa43d6c5e82fcee89bd5438b907c3c4b8c3ea08011/merged/usr/bin/nvidia-smi:
 file exists\\\\n\\\"\"": unknown")
   Command exited with non-zero status 1
   1.87user 0.84system 27:34.59elapsed 0%CPU (0avgtext+0avgdata 
68112maxresident)k
   0inputs+0outputs (0major+38834minor)pagefaults 0swaps
   piotr@34-215-40-140:1: ~/mxnet [upstream_master]> time ci/build.py 
--no-cache --nvidiadocker --platform ubuntu_build_cuda 
/work/runtime_functions.sh build_ubuntu_gpu_cuda101_cudnn7
   
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to