Re: Some of CI jobs failed due to lack of disk space - now back to normal

2017-03-24 Thread Christian Ridderström
On 21 March 2017 at 21:11, Christian Ridderström  wrote:

> I think I primarily want to make the CI workers have more disk space, as I
> think we in general would like to keep e.g. the workspace of the last
> successful build, and at the same time have more than five big CI jobs.
>
> So I've started to look at making a CI worker with a bigger disk, and at
> the same time base it on the new Inria template that uses Ubuntu 16.04 LTS.
>

For info: I've created a new CI worker, lyx-linux0, with an extra volume of
40 GB mounted as /builds/workspace, which is where the CI worker keeps the
builds.
For that worker we should be much better of now regarding disk space.   The
only downside I've seen is that the CI server doesn't correctly measure the
remaining disk space on the CI worker - it doesn't seem to see the extra
disk space.  I'll go with this for a while and we'll see if/when we run
into problems.  Later I'll add more disk space to the other CI workers, or
create new ones, depending on what's easier.
/Christian

PS.
Inria now has a "featured template" with Ubuntu 16.04 which is pretty much
already configured to be used as a CI worker. So it was very quick work to
create the new CI worker.


Re: Some of CI jobs failed due to lack of disk space - now back to normal

2017-03-21 Thread Scott Kostyshak
On Tue, Mar 21, 2017 at 09:11:33PM +0100, Christian Ridderström wrote:
> On 21 March 2017 at 15:48, Scott Kostyshak  wrote:
> 
> > > If anyone's interested, you can see remaining disk space (_if logged in_)
> > > for the CI workers at this link:
> > >   https://ci.inria.fr/lyx/computer/
> >
> > Would it be possible and desirable to have an email sent to the
> > developers list if the free disk space is below e.g. 5 GB?
> 
> 
> E-mail notification would be ok, and probably useful.
> I had already into it a little, but oddly enough, I didn't find a
> ready-made plugin.
> 
> One of the core problems is related to our use of Docker images. When the
> "build image" runs, it creates files that doesn't belong to the standard
> user, 'ci'. I've solved this for the case when the build stage succeeds by
> running a second Docker image to 'chown' the workdir to user 'ci'. However,
> if the build stage fails, the 'chown' part is never executed.  This leads
> to old workdirs piling up, and Jenkins not being able to delete them.  Then
> I've gone in and manually deleted them.
> 
> Another core problem is that the CI workers only have about 20 GB each...
> and with a test job needing 4 GB, we can quickly run out of space. The
> solution is to make the CI worker have more space, and it should be
> possible to do this, I just haven't figured out how to mount the extra disk
> space yet.
> 
> I think I primarily want to make the CI workers have more disk space, as I
> think we in general would like to keep e.g. the workspace of the last
> successful build, and at the same time have more than five big CI jobs.
> 
> So I've started to look at making a CI worker with a bigger disk, and at
> the same time base it on the new Inria template that uses Ubuntu 16.04 LTS.
> /Christian

Sounds good, Christian. Thanks for all of your work on this.

Scott


signature.asc
Description: PGP signature


Re: Some of CI jobs failed due to lack of disk space - now back to normal

2017-03-21 Thread Christian Ridderström
On 21 March 2017 at 15:48, Scott Kostyshak  wrote:

> > If anyone's interested, you can see remaining disk space (_if logged in_)
> > for the CI workers at this link:
> >   https://ci.inria.fr/lyx/computer/
>
> Would it be possible and desirable to have an email sent to the
> developers list if the free disk space is below e.g. 5 GB?


E-mail notification would be ok, and probably useful.
I had already into it a little, but oddly enough, I didn't find a
ready-made plugin.

One of the core problems is related to our use of Docker images. When the
"build image" runs, it creates files that doesn't belong to the standard
user, 'ci'. I've solved this for the case when the build stage succeeds by
running a second Docker image to 'chown' the workdir to user 'ci'. However,
if the build stage fails, the 'chown' part is never executed.  This leads
to old workdirs piling up, and Jenkins not being able to delete them.  Then
I've gone in and manually deleted them.

Another core problem is that the CI workers only have about 20 GB each...
and with a test job needing 4 GB, we can quickly run out of space. The
solution is to make the CI worker have more space, and it should be
possible to do this, I just haven't figured out how to mount the extra disk
space yet.

I think I primarily want to make the CI workers have more disk space, as I
think we in general would like to keep e.g. the workspace of the last
successful build, and at the same time have more than five big CI jobs.

So I've started to look at making a CI worker with a bigger disk, and at
the same time base it on the new Inria template that uses Ubuntu 16.04 LTS.
/Christian


Re: Some of CI jobs failed due to lack of disk space - now back to normal

2017-03-21 Thread Scott Kostyshak
On Sun, Mar 19, 2017 at 10:59:53PM +0100, Christian Ridderström wrote:
> Hi,
> 
> Just want to let you know that some of the CI jobs simply failed because
> the CI workers ran out of disk space. Some CI jobs take 4 GB, and with no
> remaining workdirs from CI jobs, the total disk space per slave is about 20
> GB. Anyway, I've done some cleaning and this aspect should be back to
> normal now.
> 
> If anyone's interested, you can see remaining disk space (_if logged in_)
> for the CI workers at this link:
>   https://ci.inria.fr/lyx/computer/

Would it be possible and desirable to have an email sent to the
developers list if the free disk space is below e.g. 5 GB?

By the way, thanks for all of this work Christian. CI testing is a great
step forward!

Scott


signature.asc
Description: PGP signature


Some of CI jobs failed due to lack of disk space - now back to normal

2017-03-19 Thread Christian Ridderström
Hi,

Just want to let you know that some of the CI jobs simply failed because
the CI workers ran out of disk space. Some CI jobs take 4 GB, and with no
remaining workdirs from CI jobs, the total disk space per slave is about 20
GB. Anyway, I've done some cleaning and this aspect should be back to
normal now.

If anyone's interested, you can see remaining disk space (_if logged in_)
for the CI workers at this link:
  https://ci.inria.fr/lyx/computer/

/Christian