josephevans commented on a change in pull request #19890:
URL: https://github.com/apache/incubator-mxnet/pull/19890#discussion_r575473739



##########
File path: ci/safe_docker_run.py
##########
@@ -117,6 +119,9 @@ def run(self, *args, **kwargs) -> int:
         ret = 0
         try:
             # Race condition:
+            # add a random sleep to (a) give docker time to flush disk buffer 
after pulling image
+            # and (b) minimize race conditions between jenkins runs on same 
host
+            time.sleep(random.randint(2,10))

Review comment:
       Each jenkins slave (linux cpu nodes, at least) have 2 "slots" they can 
run in parallel, and when 2 jobs using the same docker images start at the 
exact same time on these 2 slots, they both will attempt to pull down the image 
from ECR and start a container. If we randomize the delay, the idea is that 
both containers won't be requested to start at the exact same time.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to