leezu commented on a change in pull request #16722: Remove unused files in
Website doc
URL: https://github.com/apache/incubator-mxnet/pull/16722#discussion_r407320590
##########
File path: ci/safe_docker_run.py
##########
@@ -54,7 +54,7 @@ def _trim_container_id(cid):
return cid[:12]
def __init__(self):
- self._docker_client = docker.from_env()
+ self._docker_client = docker.from_env(timeout=None)
Review comment:
@marcoabreu FYI after rebuilding our Unix CPU Jenkins Slave AMI via the
scripts in https://github.com/apache/incubator-mxnet-ci/ we intermittently run
into https://github.com/docker/docker-py/issues/2266 .
The rootcause is that when a new instance is spawned, booted, Docker agent
started and Jenkins immediately requests starting a container, docker may take
longer than 60 seconds to finish the startup. In the
[log](http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Fedge/detail/PR-16722/8/pipeline),
line 294 to 295 you can see that the startup took 70 seconds, triggering the
timeout.
The issue is tracked upstream at
https://github.com/docker/docker-py/issues/2266 and appears to be a regression
in Docker. (Note that we're using a 2 year old Docker version previously, as
the AMI hasn't been regenerated.)
Considering we're talking about a socket connecting to localhost, we have no
risk of connection issues and do not need to use a timeout.
Added the fix to this PR, as this PR experienced the issue and is ready to
be merged.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services