Seems like the CI will be down until some other people turn off their
instances...

Error
We currently do not have sufficient g3.8xlarge capacity in zones with
support for 'gp2' volumes. Our system will be working on provisioning
additional capacity.

-Marco


On Thu, May 3, 2018 at 9:40 PM, Jin, Hao <[email protected]> wrote:

> Thanks a lot Marco!
> Hao
>
> On 5/3/18, 12:02 PM, "Marco de Abreu" <[email protected]>
> wrote:
>
>     Hello,
>
>     I'm already investigating the issue and it seems to be related to the
>     recently introduced KVStore tests. They tend to hang, leading to job be
>     forcefully terminated by Jenkins. The problem here is that this does
> not
>     terminate the underlying Docker containers, leading to unreleased
> resources.
>
>     As an immediate solution, I will restart all slaves to ensure the CI is
>     running again. After that, I will try to find a solution to detect and
>     release these containers.
>
>     Best regards,
>     Marco
>
>     On Thu, May 3, 2018 at 8:55 PM, Jin, Hao <[email protected]> wrote:
>
>     > I’ve encountered 2 failed GPU builds due to “initialization error:
> driver
>     > error: failed to process request”, the links to the failed builds
> are:
>     > http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/
>     > incubator-mxnet/detail/PR-10645/17/pipeline/674
>     > http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/
>     > incubator-mxnet/detail/PR-10533/18/pipeline
>     >
>     >
>
>
>

Reply via email to