leezu commented on a change in pull request #16722: Remove unused files in 
Website doc
URL: https://github.com/apache/incubator-mxnet/pull/16722#discussion_r407320590
 
 

 ##########
 File path: ci/safe_docker_run.py
 ##########
 @@ -54,7 +54,7 @@ def _trim_container_id(cid):
         return cid[:12]
 
     def __init__(self):
-        self._docker_client = docker.from_env()
+        self._docker_client = docker.from_env(timeout=None)
 
 Review comment:
   @marcoabreu FYI after rebuilding our Unix CPU Jenkins Slave AMI via the 
scripts in https://github.com/apache/incubator-mxnet-ci/ we intermittently run 
into https://github.com/docker/docker-py/issues/2266 .
   
   The rootcause is that when a new instance is spawned, booted, Docker agent 
started and Jenkins immediately requests starting a container, docker may take 
longer than 60 seconds to finish the startup. In the 
[log](http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Fedge/detail/PR-16722/8/pipeline),
 line 294 to 295 you can see that the startup took 70 seconds, triggering the 
timeout.
   
   The issue is tracked upstream at 
https://github.com/docker/docker-py/issues/2266 and appears to be a regression 
in Docker. (Note that we're using a 2 year old Docker version previously, as 
the AMI hasn't been regenerated.) 
   
   Considering we're talking about a socket connecting to localhost, we have no 
risk of connection issues and do not need to use a timeout.
   
   Added the fix to this PR, as this PR experienced the issue and is ready to 
be merged.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to