Hello everybody,

I just wanted to inform you that there have been a few hiccups in our CI
system which might have lead to some of your PRs failing. I tried to
re-trigger the ones I saw, but please feel free to make an empty commit to
do it yourself.

To give some background: This is caused by our Dockercache on Ubuntu CPU
instances not being properly purged due to the high number of Dockerfile
changes. As of now, Jenkins already detects this ahead of time and
disconnects the slaves, but this results in resource starvation and faulty
slaves. For the time being, I have redeployed all Ubuntu CPU slaves and
increased the disk size from 350GB to 550GB.

I'm currently working on the auto scaling project for our CI, which will
also introduce a new set of slaves with a new architecture. These slaves
will have a proper purge job and this problem should not come up again
after deploying them to production. In case precautions like these fail,
our systems will automatically detect faulty or degrading slaves and remove
them from the running system without impairing running builds as well
making sure they're getting replaced.

Considering it's only a few more weeks until the expected roll-out, I've
taken the liberty to focus my time on the development of that project and
decided to leave the slaves as they are for now. In future, this
house-keeping will be completely automatized.

Best regards,
Marco

Reply via email to