Hey,

I've recently created a solution for the growing /tmp directory. Part of it
is the job mentioned by Tyson: *beam_Clean_tmp_directory*. It's
intentionally not triggered by cron and should be a last resort solution
for some strange cases.

Along with that job, I've also updated every worker with an internal cron
script. It's being executed once a week and deletes all the files (and only
files) that were not accessed for at least three days. That's designed to
be as safe as possible for the running jobs on the worker (not to delete
the files that are still in use), and also to be insensitive to the current
workload on the machine. The cleanup will always happen, even if some
long-running/stuck jobs are blocking the machine.

I also think that currently the "No space left" errors may be a consequence
of growing workspace directory rather than /tmp. I didn't do any detailed
analysis but e.g. currently, on apache-beam-jenkins-7 the workspace
directory size is 158 GB while /tmp is only 16 GB. We should either
guarantee the disk size to hold workspaces for all jobs (because
eventually, every worker will execute each job) or clear also the
workspaces in some way.

Regards,
Damian


On Mon, Jul 20, 2020 at 10:43 AM Maximilian Michels <m...@apache.org> wrote:

> +1 for scheduling it via a cron job if it won't lead to test failures
> while running. Not a Jenkins expert but maybe there is the notion of
> running exclusively while no other tasks are running?
>
> -Max
>
> On 17.07.20 21:49, Tyson Hamilton wrote:
> > FYI there was a job introduced to do this in Jenkins:
> beam_Clean_tmp_directory
> >
> > Currently it needs to be run manually. I'm seeing some out of disk
> related errors in precommit tests currently, perhaps we should schedule
> this job with cron?
> >
> >
> > On 2020/03/11 19:31:13, Heejong Lee <heej...@google.com> wrote:
> >> Still seeing no space left on device errors on jenkins-7 (for example:
> >> https://builds.apache.org/job/beam_PreCommit_PythonLint_Commit/2754/)
> >>
> >>
> >> On Fri, Mar 6, 2020 at 7:11 PM Alan Myrvold <amyrv...@google.com>
> wrote:
> >>
> >>> Did a one time cleanup of tmp files owned by jenkins older than 3 days.
> >>> Agree that we need a longer term solution.
> >>>
> >>> Passing recent tests on all executors except jenkins-12, which has not
> >>> scheduled recent builds for the past 13 days. Not scheduling:
> >>> https://builds.apache.org/computer/apache-beam-jenkins-12/builds
> >>> <
> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-12/builds&sa=D
> >
> >>> Recent passing builds:
> >>> https://builds.apache.org/computer/apache-beam-jenkins-1/builds
> >>> <
> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-1/builds&sa=D
> >
> >>> https://builds.apache.org/computer/apache-beam-jenkins-2/builds
> >>> <
> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-2/builds&sa=D
> >
> >>> https://builds.apache.org/computer/apache-beam-jenkins-3/builds
> >>> <
> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-3/builds&sa=D
> >
> >>> https://builds.apache.org/computer/apache-beam-jenkins-4/builds
> >>> <
> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-4/builds&sa=D
> >
> >>> https://builds.apache.org/computer/apache-beam-jenkins-5/builds
> >>> <
> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-5/builds&sa=D
> >
> >>> https://builds.apache.org/computer/apache-beam-jenkins-6/builds
> >>> <
> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-6/builds&sa=D
> >
> >>> https://builds.apache.org/computer/apache-beam-jenkins-7/builds
> >>> <
> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-7/builds&sa=D
> >
> >>> https://builds.apache.org/computer/apache-beam-jenkins-8/builds
> >>> <
> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-8/builds&sa=D
> >
> >>> https://builds.apache.org/computer/apache-beam-jenkins-9/builds
> >>> <
> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-9/builds&sa=D
> >
> >>> https://builds.apache.org/computer/apache-beam-jenkins-10/builds
> >>> <
> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-10/builds&sa=D
> >
> >>> https://builds.apache.org/computer/apache-beam-jenkins-11/builds
> >>> <
> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-11/builds&sa=D
> >
> >>> https://builds.apache.org/computer/apache-beam-jenkins-13/builds
> >>> <
> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-13/builds&sa=D
> >
> >>> https://builds.apache.org/computer/apache-beam-jenkins-14/builds
> >>> <
> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-14/builds&sa=D
> >
> >>> https://builds.apache.org/computer/apache-beam-jenkins-15/builds
> >>> <
> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-15/builds&sa=D
> >
> >>> https://builds.apache.org/computer/apache-beam-jenkins-16/builds
> >>> <
> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-16/builds&sa=D
> >
> >>>
> >>> On Fri, Mar 6, 2020 at 11:54 AM Ahmet Altay <al...@google.com> wrote:
> >>>
> >>>> +Alan Myrvold <amyrv...@google.com> is doing a one time cleanup. I
> agree
> >>>> that we need to have a solution to automate this task or address the
> root
> >>>> cause of the buildup.
> >>>>
> >>>> On Thu, Mar 5, 2020 at 2:47 AM Michał Walenia <
> michal.wale...@polidea.com>
> >>>> wrote:
> >>>>
> >>>>> Hi there,
> >>>>> it seems we have a problem with Jenkins workers again. Nodes 1 and 7
> >>>>> both fail jobs with "No space left on device".
> >>>>> Who is the best person to contact in these cases (someone with access
> >>>>> permissions to the workers).
> >>>>>
> >>>>> I also noticed that such errors are becoming more and more frequent
> >>>>> recently and I'd like to discuss how can this be remedied. Can a
> cleanup
> >>>>> task be automated on Jenkins somehow?
> >>>>>
> >>>>> Regards
> >>>>> Michal
> >>>>>
> >>>>> --
> >>>>>
> >>>>> Michał Walenia
> >>>>> Polidea <https://www.polidea.com/> | Software Engineer
> >>>>>
> >>>>> M: +48 791 432 002 <+48791432002>
> >>>>> E: michal.wale...@polidea.com
> >>>>>
> >>>>> Unique Tech
> >>>>> Check out our projects! <https://www.polidea.com/our-work>
> >>>>>
> >>>>
> >>
>

Reply via email to