Hey, I've recently created a solution for the growing /tmp directory. Part of it is the job mentioned by Tyson: *beam_Clean_tmp_directory*. It's intentionally not triggered by cron and should be a last resort solution for some strange cases.
Along with that job, I've also updated every worker with an internal cron script. It's being executed once a week and deletes all the files (and only files) that were not accessed for at least three days. That's designed to be as safe as possible for the running jobs on the worker (not to delete the files that are still in use), and also to be insensitive to the current workload on the machine. The cleanup will always happen, even if some long-running/stuck jobs are blocking the machine. I also think that currently the "No space left" errors may be a consequence of growing workspace directory rather than /tmp. I didn't do any detailed analysis but e.g. currently, on apache-beam-jenkins-7 the workspace directory size is 158 GB while /tmp is only 16 GB. We should either guarantee the disk size to hold workspaces for all jobs (because eventually, every worker will execute each job) or clear also the workspaces in some way. Regards, Damian On Mon, Jul 20, 2020 at 10:43 AM Maximilian Michels <m...@apache.org> wrote: > +1 for scheduling it via a cron job if it won't lead to test failures > while running. Not a Jenkins expert but maybe there is the notion of > running exclusively while no other tasks are running? > > -Max > > On 17.07.20 21:49, Tyson Hamilton wrote: > > FYI there was a job introduced to do this in Jenkins: > beam_Clean_tmp_directory > > > > Currently it needs to be run manually. I'm seeing some out of disk > related errors in precommit tests currently, perhaps we should schedule > this job with cron? > > > > > > On 2020/03/11 19:31:13, Heejong Lee <heej...@google.com> wrote: > >> Still seeing no space left on device errors on jenkins-7 (for example: > >> https://builds.apache.org/job/beam_PreCommit_PythonLint_Commit/2754/) > >> > >> > >> On Fri, Mar 6, 2020 at 7:11 PM Alan Myrvold <amyrv...@google.com> > wrote: > >> > >>> Did a one time cleanup of tmp files owned by jenkins older than 3 days. > >>> Agree that we need a longer term solution. > >>> > >>> Passing recent tests on all executors except jenkins-12, which has not > >>> scheduled recent builds for the past 13 days. Not scheduling: > >>> https://builds.apache.org/computer/apache-beam-jenkins-12/builds > >>> < > https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-12/builds&sa=D > > > >>> Recent passing builds: > >>> https://builds.apache.org/computer/apache-beam-jenkins-1/builds > >>> < > https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-1/builds&sa=D > > > >>> https://builds.apache.org/computer/apache-beam-jenkins-2/builds > >>> < > https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-2/builds&sa=D > > > >>> https://builds.apache.org/computer/apache-beam-jenkins-3/builds > >>> < > https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-3/builds&sa=D > > > >>> https://builds.apache.org/computer/apache-beam-jenkins-4/builds > >>> < > https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-4/builds&sa=D > > > >>> https://builds.apache.org/computer/apache-beam-jenkins-5/builds > >>> < > https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-5/builds&sa=D > > > >>> https://builds.apache.org/computer/apache-beam-jenkins-6/builds > >>> < > https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-6/builds&sa=D > > > >>> https://builds.apache.org/computer/apache-beam-jenkins-7/builds > >>> < > https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-7/builds&sa=D > > > >>> https://builds.apache.org/computer/apache-beam-jenkins-8/builds > >>> < > https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-8/builds&sa=D > > > >>> https://builds.apache.org/computer/apache-beam-jenkins-9/builds > >>> < > https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-9/builds&sa=D > > > >>> https://builds.apache.org/computer/apache-beam-jenkins-10/builds > >>> < > https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-10/builds&sa=D > > > >>> https://builds.apache.org/computer/apache-beam-jenkins-11/builds > >>> < > https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-11/builds&sa=D > > > >>> https://builds.apache.org/computer/apache-beam-jenkins-13/builds > >>> < > https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-13/builds&sa=D > > > >>> https://builds.apache.org/computer/apache-beam-jenkins-14/builds > >>> < > https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-14/builds&sa=D > > > >>> https://builds.apache.org/computer/apache-beam-jenkins-15/builds > >>> < > https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-15/builds&sa=D > > > >>> https://builds.apache.org/computer/apache-beam-jenkins-16/builds > >>> < > https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-16/builds&sa=D > > > >>> > >>> On Fri, Mar 6, 2020 at 11:54 AM Ahmet Altay <al...@google.com> wrote: > >>> > >>>> +Alan Myrvold <amyrv...@google.com> is doing a one time cleanup. I > agree > >>>> that we need to have a solution to automate this task or address the > root > >>>> cause of the buildup. > >>>> > >>>> On Thu, Mar 5, 2020 at 2:47 AM Michał Walenia < > michal.wale...@polidea.com> > >>>> wrote: > >>>> > >>>>> Hi there, > >>>>> it seems we have a problem with Jenkins workers again. Nodes 1 and 7 > >>>>> both fail jobs with "No space left on device". > >>>>> Who is the best person to contact in these cases (someone with access > >>>>> permissions to the workers). > >>>>> > >>>>> I also noticed that such errors are becoming more and more frequent > >>>>> recently and I'd like to discuss how can this be remedied. Can a > cleanup > >>>>> task be automated on Jenkins somehow? > >>>>> > >>>>> Regards > >>>>> Michal > >>>>> > >>>>> -- > >>>>> > >>>>> Michał Walenia > >>>>> Polidea <https://www.polidea.com/> | Software Engineer > >>>>> > >>>>> M: +48 791 432 002 <+48791432002> > >>>>> E: michal.wale...@polidea.com > >>>>> > >>>>> Unique Tech > >>>>> Check out our projects! <https://www.polidea.com/our-work> > >>>>> > >>>> > >> >