larroy opened a new issue #16950: Failure on docker run for ubuntu_build_cuda URL: https://github.com/apache/incubator-mxnet/issues/16950 ## Description There seems to be a problem with nvidia docker in EC2 when running in recent 18.04 environment. ``` time ci/build.py --no-cache --nvidiadocker --platform ubuntu_build_cuda /work/runtime_functions.sh build_ubuntu_gpu_cuda101_cudnn7 ``` ``` Traceback (most recent call last): File "/home/piotr/.local/lib/python3.6/site-packages/docker/api/client.py", line 261, in _raise_for_status response.raise_for_status() File "/home/piotr/.local/lib/python3.6/site-packages/requests/models.py", line 940, in raise_for_status raise HTTPError(http_error_msg, response=self) requests.exceptions.HTTPError: 500 Server Error: Internal Server Error for url: http+docker://localhost/v1.35/containers/4fafc8918ee1a6c4897cfc5f107cd1aad43b1cfbc9ecc2e438b9d4681c3e9824/start During handling of the above exception, another exception occurred: Traceback (most recent call last): File "ci/build.py", line 454, in <module> sys.exit(main()) File "ci/build.py", line 378, in main local_ccache_dir=args.ccache_dir, environment=environment) File "ci/build.py", line 248, in container_run environment=environment) File "/home/piotr/mxnet/ci/safe_docker_run.py", line 123, in run container = self._add_container(self._docker_client.containers.run(*args, **kwargs)) File "/home/piotr/.local/lib/python3.6/site-packages/docker/models/containers.py", line 809, in run container.start() File "/home/piotr/.local/lib/python3.6/site-packages/docker/models/containers.py", line 400, in start return self.client.api.start(self.id, **kwargs) File "/home/piotr/.local/lib/python3.6/site-packages/docker/utils/decorators.py", line 19, in wrapped return f(self, resource_id, *args, **kwargs) File "/home/piotr/.local/lib/python3.6/site-packages/docker/api/container.py", line 1095, in start self._raise_for_status(res) File "/home/piotr/.local/lib/python3.6/site-packages/docker/api/client.py", line 263, in _raise_for_status raise create_api_error_from_http_exception(e) File "/home/piotr/.local/lib/python3.6/site-packages/docker/errors.py", line 31, in create_api_error_from_http_exception raise cls(e, response=response, explanation=explanation) docker.errors.APIError: 500 Server Error: Internal Server Error ("OCI runtime create failed: container_linux.go:346: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 1 caused \\\"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: mount error: file creation failed: /var/lib/docker/overlay2/1e6555a578e7365e7439fbfa43d6c5e82fcee89bd5438b907c3c4b8c3ea08011/merged/usr/bin/nvidia-smi: file exists\\\\n\\\"\"": unknown") Command exited with non-zero status 1 1.87user 0.84system 27:34.59elapsed 0%CPU (0avgtext+0avgdata 68112maxresident)k 0inputs+0outputs (0major+38834minor)pagefaults 0swaps piotr@34-215-40-140:1: ~/mxnet [upstream_master]> time ci/build.py --no-cache --nvidiadocker --platform ubuntu_build_cuda /work/runtime_functions.sh build_ubuntu_gpu_cuda101_cudnn7 ```
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
