Re: Some of CI jobs failed due to lack of disk space - now back to normal
On 21 March 2017 at 21:11, Christian Ridderströmwrote: > I think I primarily want to make the CI workers have more disk space, as I > think we in general would like to keep e.g. the workspace of the last > successful build, and at the same time have more than five big CI jobs. > > So I've started to look at making a CI worker with a bigger disk, and at > the same time base it on the new Inria template that uses Ubuntu 16.04 LTS. > For info: I've created a new CI worker, lyx-linux0, with an extra volume of 40 GB mounted as /builds/workspace, which is where the CI worker keeps the builds. For that worker we should be much better of now regarding disk space. The only downside I've seen is that the CI server doesn't correctly measure the remaining disk space on the CI worker - it doesn't seem to see the extra disk space. I'll go with this for a while and we'll see if/when we run into problems. Later I'll add more disk space to the other CI workers, or create new ones, depending on what's easier. /Christian PS. Inria now has a "featured template" with Ubuntu 16.04 which is pretty much already configured to be used as a CI worker. So it was very quick work to create the new CI worker.
Re: Some of CI jobs failed due to lack of disk space - now back to normal
On Tue, Mar 21, 2017 at 09:11:33PM +0100, Christian Ridderström wrote: > On 21 March 2017 at 15:48, Scott Kostyshakwrote: > > > > If anyone's interested, you can see remaining disk space (_if logged in_) > > > for the CI workers at this link: > > > https://ci.inria.fr/lyx/computer/ > > > > Would it be possible and desirable to have an email sent to the > > developers list if the free disk space is below e.g. 5 GB? > > > E-mail notification would be ok, and probably useful. > I had already into it a little, but oddly enough, I didn't find a > ready-made plugin. > > One of the core problems is related to our use of Docker images. When the > "build image" runs, it creates files that doesn't belong to the standard > user, 'ci'. I've solved this for the case when the build stage succeeds by > running a second Docker image to 'chown' the workdir to user 'ci'. However, > if the build stage fails, the 'chown' part is never executed. This leads > to old workdirs piling up, and Jenkins not being able to delete them. Then > I've gone in and manually deleted them. > > Another core problem is that the CI workers only have about 20 GB each... > and with a test job needing 4 GB, we can quickly run out of space. The > solution is to make the CI worker have more space, and it should be > possible to do this, I just haven't figured out how to mount the extra disk > space yet. > > I think I primarily want to make the CI workers have more disk space, as I > think we in general would like to keep e.g. the workspace of the last > successful build, and at the same time have more than five big CI jobs. > > So I've started to look at making a CI worker with a bigger disk, and at > the same time base it on the new Inria template that uses Ubuntu 16.04 LTS. > /Christian Sounds good, Christian. Thanks for all of your work on this. Scott signature.asc Description: PGP signature
Re: Some of CI jobs failed due to lack of disk space - now back to normal
On 21 March 2017 at 15:48, Scott Kostyshakwrote: > > If anyone's interested, you can see remaining disk space (_if logged in_) > > for the CI workers at this link: > > https://ci.inria.fr/lyx/computer/ > > Would it be possible and desirable to have an email sent to the > developers list if the free disk space is below e.g. 5 GB? E-mail notification would be ok, and probably useful. I had already into it a little, but oddly enough, I didn't find a ready-made plugin. One of the core problems is related to our use of Docker images. When the "build image" runs, it creates files that doesn't belong to the standard user, 'ci'. I've solved this for the case when the build stage succeeds by running a second Docker image to 'chown' the workdir to user 'ci'. However, if the build stage fails, the 'chown' part is never executed. This leads to old workdirs piling up, and Jenkins not being able to delete them. Then I've gone in and manually deleted them. Another core problem is that the CI workers only have about 20 GB each... and with a test job needing 4 GB, we can quickly run out of space. The solution is to make the CI worker have more space, and it should be possible to do this, I just haven't figured out how to mount the extra disk space yet. I think I primarily want to make the CI workers have more disk space, as I think we in general would like to keep e.g. the workspace of the last successful build, and at the same time have more than five big CI jobs. So I've started to look at making a CI worker with a bigger disk, and at the same time base it on the new Inria template that uses Ubuntu 16.04 LTS. /Christian
Re: Some of CI jobs failed due to lack of disk space - now back to normal
On Sun, Mar 19, 2017 at 10:59:53PM +0100, Christian Ridderström wrote: > Hi, > > Just want to let you know that some of the CI jobs simply failed because > the CI workers ran out of disk space. Some CI jobs take 4 GB, and with no > remaining workdirs from CI jobs, the total disk space per slave is about 20 > GB. Anyway, I've done some cleaning and this aspect should be back to > normal now. > > If anyone's interested, you can see remaining disk space (_if logged in_) > for the CI workers at this link: > https://ci.inria.fr/lyx/computer/ Would it be possible and desirable to have an email sent to the developers list if the free disk space is below e.g. 5 GB? By the way, thanks for all of this work Christian. CI testing is a great step forward! Scott signature.asc Description: PGP signature
Some of CI jobs failed due to lack of disk space - now back to normal
Hi, Just want to let you know that some of the CI jobs simply failed because the CI workers ran out of disk space. Some CI jobs take 4 GB, and with no remaining workdirs from CI jobs, the total disk space per slave is about 20 GB. Anyway, I've done some cleaning and this aspect should be back to normal now. If anyone's interested, you can see remaining disk space (_if logged in_) for the CI workers at this link: https://ci.inria.fr/lyx/computer/ /Christian