josephevans opened a new pull request #19903: URL: https://github.com/apache/incubator-mxnet/pull/19903
## Description ## Forward port #19895 from v1.x to master. This PR makes a number of changes to make it more stable: * Remove SafeDocker client which uses python docker package to run containers. Change to use "docker run" command directly using subprocess.call(), because the python-docker client does not support a gpus parameter which newer docker versions use and we don't get timeout issues when using the docker command directly. This will allow us to update our AMIs to use newer docker versions. * In order to support both docker variants simultaneously, we first try to use the --gpus all parameter to docker run, if it fails with error code 125 (which means docker run command itself failed,) then we retry using the old --runtime nvidia parameter. * Remove the extra custom codecov calls (which usually fail and have to retry multiple times, even though the initial codecov command works.) ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
