Bleck. I just realized that it is 'offline' so that won't work. I'll clean up manually on the machine using the cron command.
On Fri, Jul 24, 2020 at 10:56 AM Tyson Hamilton <tyso...@google.com> wrote: > Something isn't working with the current set up because node 15 appears to > be out of space and is currently 'offline' according to Jenkins. Can > someone run the cleanup job? The machine is full, > > @apache-ci-beam-jenkins-15:/tmp$ df -h > Filesystem Size Used Avail Use% Mounted on > udev 52G 0 52G 0% /dev > tmpfs 11G 265M 10G 3% /run > */dev/sda1 485G 484G 880M 100% /* > tmpfs 52G 0 52G 0% /dev/shm > tmpfs 5.0M 0 5.0M 0% /run/lock > tmpfs 52G 0 52G 0% /sys/fs/cgroup > tmpfs 11G 0 11G 0% /run/user/1017 > tmpfs 11G 0 11G 0% /run/user/1037 > > apache-ci-beam-jenkins-15:/tmp$ sudo du -ah --time . | sort -rhk 1,1 | > head -n 20 > 20G 2020-07-24 17:52 . > 580M 2020-07-22 17:31 ./junit1031982597110125586 > 517M 2020-07-22 17:31 > ./junit1031982597110125586/junit8739924829337821410/heap_dump.hprof > 517M 2020-07-22 17:31 > ./junit1031982597110125586/junit8739924829337821410 > 263M 2020-07-22 12:23 ./pip-install-2GUhO_ > 263M 2020-07-20 09:30 ./pip-install-sxgwqr > 263M 2020-07-17 13:56 ./pip-install-bWSKIV > 242M 2020-07-21 20:25 ./beam-pipeline-tempmByU6T > 242M 2020-07-21 20:21 ./beam-pipeline-tempV85xeK > 242M 2020-07-21 20:15 ./beam-pipeline-temp7dJROJ > 236M 2020-07-21 20:25 ./beam-pipeline-tempmByU6T/tmpOWj3Yr > 236M 2020-07-21 20:21 ./beam-pipeline-tempV85xeK/tmppbQHB3 > 236M 2020-07-21 20:15 ./beam-pipeline-temp7dJROJ/tmpgOXPKW > 111M 2020-07-23 00:57 ./pip-install-1JnyNE > 105M 2020-07-23 00:17 ./beam-artifact1374651823280819755 > 105M 2020-07-23 00:16 ./beam-artifact5050755582921936972 > 105M 2020-07-23 00:16 ./beam-artifact1834064452502646289 > 105M 2020-07-23 00:15 ./beam-artifact682561790267074916 > 105M 2020-07-23 00:15 ./beam-artifact4691304965824489394 > 105M 2020-07-23 00:14 ./beam-artifact4050383819822604421 > > On Wed, Jul 22, 2020 at 12:03 PM Robert Bradshaw <rober...@google.com> > wrote: > >> On Wed, Jul 22, 2020 at 11:57 AM Tyson Hamilton <tyso...@google.com> >> wrote: >> >>> Ah I see, thanks Kenn. I found some advice from the Apache infra wiki >>> that also suggests using a tmpdir inside the workspace [1]: >>> >>> Procedures Projects can take to clean up disk space >>> >>> Projects can help themselves and Infra by taking some basic steps to >>> help clean up their jobs after themselves on the build nodes. >>> >>> >>> >>> 1. Use a ./tmp dir in your jobs workspace. That way it gets cleaned >>> up when job workspaces expire. >>> >>> >> Tests should be (able to be) written to use the standard temporary file >> mechanisms, and the environment set up on Jenkins such that that falls into >> the respective workspaces. Ideally this should be as simple as setting >> the TMPDIR (or similar) environment variable (and making sure it exists/is >> writable). >> >>> >>> 1. Configure your jobs to wipe workspaces on start or finish. >>> 2. Configure your jobs to only keep 5 or 10 previous builds. >>> 3. Configure your jobs to only keep 5 or 10 previous artifacts. >>> >>> >>> >>> [1]: >>> https://cwiki.apache.org/confluence/display/INFRA/Disk+Space+cleanup+of+Jenkins+nodes >>> >>> On Wed, Jul 22, 2020 at 8:06 AM Kenneth Knowles <k...@apache.org> wrote: >>> >>>> Those file listings look like the result of using standard temp file >>>> APIs but with TMPDIR set to /tmp. >>>> >>>> On Mon, Jul 20, 2020 at 7:55 PM Tyson Hamilton <tyso...@google.com> >>>> wrote: >>>> >>>>> Jobs are hermetic as far as I can tell and use unique subdirectories >>>>> inside of /tmp. Here is a quick look into two examples: >>>>> >>>>> @apache-ci-beam-jenkins-4:/tmp$ sudo du -ah --time . | sort -rhk 1,1 | >>>>> head -n 20 >>>>> 1.6G 2020-07-21 02:25 . >>>>> 242M 2020-07-17 18:48 ./beam-pipeline-temp3ybuY4 >>>>> 242M 2020-07-17 18:46 ./beam-pipeline-tempuxjiPT >>>>> 242M 2020-07-17 18:44 ./beam-pipeline-tempVpg1ME >>>>> 242M 2020-07-17 18:42 ./beam-pipeline-tempJ4EpyB >>>>> 242M 2020-07-17 18:39 ./beam-pipeline-tempepea7Q >>>>> 242M 2020-07-17 18:35 ./beam-pipeline-temp79qot2 >>>>> 236M 2020-07-17 18:48 ./beam-pipeline-temp3ybuY4/tmpy_Ytzz >>>>> 236M 2020-07-17 18:46 ./beam-pipeline-tempuxjiPT/tmpN5_UfJ >>>>> 236M 2020-07-17 18:44 ./beam-pipeline-tempVpg1ME/tmpxSm8pX >>>>> 236M 2020-07-17 18:42 ./beam-pipeline-tempJ4EpyB/tmpMZJU76 >>>>> 236M 2020-07-17 18:39 ./beam-pipeline-tempepea7Q/tmpWy1vWX >>>>> 236M 2020-07-17 18:35 ./beam-pipeline-temp79qot2/tmpvN7vWA >>>>> 3.7M 2020-07-17 18:48 ./beam-pipeline-temp3ybuY4/tmprlh_di >>>>> 3.7M 2020-07-17 18:46 ./beam-pipeline-tempuxjiPT/tmpLmVWfe >>>>> 3.7M 2020-07-17 18:44 ./beam-pipeline-tempVpg1ME/tmpvrxbY7 >>>>> 3.7M 2020-07-17 18:42 ./beam-pipeline-tempJ4EpyB/tmpLTb6Mj >>>>> 3.7M 2020-07-17 18:39 ./beam-pipeline-tempepea7Q/tmptYF1v1 >>>>> 3.7M 2020-07-17 18:35 ./beam-pipeline-temp79qot2/tmplfV0Rg >>>>> 2.7M 2020-07-17 20:10 ./pip-install-q9l227ef >>>>> >>>>> >>>>> @apache-ci-beam-jenkins-11:/tmp$ sudo du -ah --time . | sort -rhk 1,1 >>>>> | head -n 20 >>>>> 817M 2020-07-21 02:26 . >>>>> 242M 2020-07-19 12:14 ./beam-pipeline-tempUTXqlM >>>>> 242M 2020-07-19 12:11 ./beam-pipeline-tempx3Yno3 >>>>> 242M 2020-07-19 12:05 ./beam-pipeline-tempyCrMYq >>>>> 236M 2020-07-19 12:14 ./beam-pipeline-tempUTXqlM/tmpstXoL0 >>>>> 236M 2020-07-19 12:11 ./beam-pipeline-tempx3Yno3/tmpnnVn65 >>>>> 236M 2020-07-19 12:05 ./beam-pipeline-tempyCrMYq/tmpRF0iNs >>>>> 3.7M 2020-07-19 12:14 ./beam-pipeline-tempUTXqlM/tmpbJjUAQ >>>>> 3.7M 2020-07-19 12:11 ./beam-pipeline-tempx3Yno3/tmpsmmzqe >>>>> 3.7M 2020-07-19 12:05 ./beam-pipeline-tempyCrMYq/tmp5b3ZvY >>>>> 2.0M 2020-07-19 12:14 ./beam-pipeline-tempUTXqlM/tmpoj3orz >>>>> 2.0M 2020-07-19 12:11 ./beam-pipeline-tempx3Yno3/tmptng9sZ >>>>> 2.0M 2020-07-19 12:05 ./beam-pipeline-tempyCrMYq/tmpWp6njc >>>>> 1.2M 2020-07-19 12:14 ./beam-pipeline-tempUTXqlM/tmphgdj35 >>>>> 1.2M 2020-07-19 12:11 ./beam-pipeline-tempx3Yno3/tmp8ySXpm >>>>> 1.2M 2020-07-19 12:05 ./beam-pipeline-tempyCrMYq/tmpNVEJ4e >>>>> 992K 2020-07-12 12:00 ./junit642086915811430564 >>>>> 988K 2020-07-12 12:00 ./junit642086915811430564/beam >>>>> 984K 2020-07-12 12:00 ./junit642086915811430564/beam/nodes >>>>> 980K 2020-07-12 12:00 ./junit642086915811430564/beam/nodes/0 >>>>> >>>>> >>>>> >>>>> On Mon, Jul 20, 2020 at 6:46 PM Udi Meiri <eh...@google.com> wrote: >>>>> >>>>>> You're right, job workspaces should be hermetic. >>>>>> >>>>>> >>>>>> >>>>>> On Mon, Jul 20, 2020 at 1:24 PM Kenneth Knowles <k...@apache.org> >>>>>> wrote: >>>>>> >>>>>>> I'm probably late to this discussion and missing something, but why >>>>>>> are we writing to /tmp at all? I would expect TMPDIR to point somewhere >>>>>>> inside the job directory that will be wiped by Jenkins, and I would >>>>>>> expect >>>>>>> code to always create temp files via APIs that respect this. Is Jenkins >>>>>>> not >>>>>>> cleaning up? Do we not have the ability to set this up? Do we have bugs >>>>>>> in >>>>>>> our code (that we could probably find by setting TMPDIR to somewhere >>>>>>> not-/tmp and running the tests without write permission to /tmp, etc) >>>>>>> >>>>>>> Kenn >>>>>>> >>>>>>> On Mon, Jul 20, 2020 at 11:39 AM Ahmet Altay <al...@google.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Related to workspace directory growth, +Udi Meiri >>>>>>>> <eh...@google.com> filed a relevant issue previously ( >>>>>>>> https://issues.apache.org/jira/browse/BEAM-9865) for cleaning up >>>>>>>> workspace directory after successful jobs. Alternatively, we can >>>>>>>> consider >>>>>>>> periodically cleaning up the /src directories. >>>>>>>> >>>>>>>> I would suggest moving the cron task from internal cron scripts to >>>>>>>> the inventory job ( >>>>>>>> https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_Inventory.groovy#L51). >>>>>>>> That way, we can see all the cron jobs as part of the source tree, >>>>>>>> adjust >>>>>>>> frequencies and clean up codes with PRs. I do not know how internal >>>>>>>> cron >>>>>>>> scripts are created, maintained, and how would they be recreated for >>>>>>>> new >>>>>>>> worker instances. >>>>>>>> >>>>>>>> /cc +Tyson Hamilton <tyso...@google.com> >>>>>>>> >>>>>>>> On Mon, Jul 20, 2020 at 4:50 AM Damian Gadomski < >>>>>>>> damian.gadom...@polidea.com> wrote: >>>>>>>> >>>>>>>>> Hey, >>>>>>>>> >>>>>>>>> I've recently created a solution for the growing /tmp directory. >>>>>>>>> Part of it is the job mentioned by Tyson: >>>>>>>>> *beam_Clean_tmp_directory*. It's intentionally not triggered by >>>>>>>>> cron and should be a last resort solution for some strange cases. >>>>>>>>> >>>>>>>>> Along with that job, I've also updated every worker with an >>>>>>>>> internal cron script. It's being executed once a week and deletes all >>>>>>>>> the >>>>>>>>> files (and only files) that were not accessed for at least three days. >>>>>>>>> That's designed to be as safe as possible for the running jobs on the >>>>>>>>> worker (not to delete the files that are still in use), and also to be >>>>>>>>> insensitive to the current workload on the machine. The cleanup will >>>>>>>>> always >>>>>>>>> happen, even if some long-running/stuck jobs are blocking the machine. >>>>>>>>> >>>>>>>>> I also think that currently the "No space left" errors may be a >>>>>>>>> consequence of growing workspace directory rather than /tmp. I didn't >>>>>>>>> do >>>>>>>>> any detailed analysis but e.g. currently, on apache-beam-jenkins-7 the >>>>>>>>> workspace directory size is 158 GB while /tmp is only 16 GB. We should >>>>>>>>> either guarantee the disk size to hold workspaces for all jobs >>>>>>>>> (because >>>>>>>>> eventually, every worker will execute each job) or clear also the >>>>>>>>> workspaces in some way. >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> Damian >>>>>>>>> >>>>>>>>> >>>>>>>>> On Mon, Jul 20, 2020 at 10:43 AM Maximilian Michels < >>>>>>>>> m...@apache.org> wrote: >>>>>>>>> >>>>>>>>>> +1 for scheduling it via a cron job if it won't lead to test >>>>>>>>>> failures >>>>>>>>>> while running. Not a Jenkins expert but maybe there is the notion >>>>>>>>>> of >>>>>>>>>> running exclusively while no other tasks are running? >>>>>>>>>> >>>>>>>>>> -Max >>>>>>>>>> >>>>>>>>>> On 17.07.20 21:49, Tyson Hamilton wrote: >>>>>>>>>> > FYI there was a job introduced to do this in Jenkins: >>>>>>>>>> beam_Clean_tmp_directory >>>>>>>>>> > >>>>>>>>>> > Currently it needs to be run manually. I'm seeing some out of >>>>>>>>>> disk related errors in precommit tests currently, perhaps we should >>>>>>>>>> schedule this job with cron? >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > On 2020/03/11 19:31:13, Heejong Lee <heej...@google.com> wrote: >>>>>>>>>> >> Still seeing no space left on device errors on jenkins-7 (for >>>>>>>>>> example: >>>>>>>>>> >> >>>>>>>>>> https://builds.apache.org/job/beam_PreCommit_PythonLint_Commit/2754/ >>>>>>>>>> ) >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> >> On Fri, Mar 6, 2020 at 7:11 PM Alan Myrvold < >>>>>>>>>> amyrv...@google.com> wrote: >>>>>>>>>> >> >>>>>>>>>> >>> Did a one time cleanup of tmp files owned by jenkins older >>>>>>>>>> than 3 days. >>>>>>>>>> >>> Agree that we need a longer term solution. >>>>>>>>>> >>> >>>>>>>>>> >>> Passing recent tests on all executors except jenkins-12, >>>>>>>>>> which has not >>>>>>>>>> >>> scheduled recent builds for the past 13 days. Not scheduling: >>>>>>>>>> >>> >>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-12/builds >>>>>>>>>> >>> < >>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-12/builds&sa=D >>>>>>>>>> > >>>>>>>>>> >>> Recent passing builds: >>>>>>>>>> >>> >>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-1/builds >>>>>>>>>> >>> < >>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-1/builds&sa=D >>>>>>>>>> > >>>>>>>>>> >>> >>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-2/builds >>>>>>>>>> >>> < >>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-2/builds&sa=D >>>>>>>>>> > >>>>>>>>>> >>> >>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-3/builds >>>>>>>>>> >>> < >>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-3/builds&sa=D >>>>>>>>>> > >>>>>>>>>> >>> >>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-4/builds >>>>>>>>>> >>> < >>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-4/builds&sa=D >>>>>>>>>> > >>>>>>>>>> >>> >>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-5/builds >>>>>>>>>> >>> < >>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-5/builds&sa=D >>>>>>>>>> > >>>>>>>>>> >>> >>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-6/builds >>>>>>>>>> >>> < >>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-6/builds&sa=D >>>>>>>>>> > >>>>>>>>>> >>> >>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-7/builds >>>>>>>>>> >>> < >>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-7/builds&sa=D >>>>>>>>>> > >>>>>>>>>> >>> >>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-8/builds >>>>>>>>>> >>> < >>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-8/builds&sa=D >>>>>>>>>> > >>>>>>>>>> >>> >>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-9/builds >>>>>>>>>> >>> < >>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-9/builds&sa=D >>>>>>>>>> > >>>>>>>>>> >>> >>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-10/builds >>>>>>>>>> >>> < >>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-10/builds&sa=D >>>>>>>>>> > >>>>>>>>>> >>> >>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-11/builds >>>>>>>>>> >>> < >>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-11/builds&sa=D >>>>>>>>>> > >>>>>>>>>> >>> >>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-13/builds >>>>>>>>>> >>> < >>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-13/builds&sa=D >>>>>>>>>> > >>>>>>>>>> >>> >>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-14/builds >>>>>>>>>> >>> < >>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-14/builds&sa=D >>>>>>>>>> > >>>>>>>>>> >>> >>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-15/builds >>>>>>>>>> >>> < >>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-15/builds&sa=D >>>>>>>>>> > >>>>>>>>>> >>> >>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-16/builds >>>>>>>>>> >>> < >>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-16/builds&sa=D >>>>>>>>>> > >>>>>>>>>> >>> >>>>>>>>>> >>> On Fri, Mar 6, 2020 at 11:54 AM Ahmet Altay <al...@google.com> >>>>>>>>>> wrote: >>>>>>>>>> >>> >>>>>>>>>> >>>> +Alan Myrvold <amyrv...@google.com> is doing a one time >>>>>>>>>> cleanup. I agree >>>>>>>>>> >>>> that we need to have a solution to automate this task or >>>>>>>>>> address the root >>>>>>>>>> >>>> cause of the buildup. >>>>>>>>>> >>>> >>>>>>>>>> >>>> On Thu, Mar 5, 2020 at 2:47 AM Michał Walenia < >>>>>>>>>> michal.wale...@polidea.com> >>>>>>>>>> >>>> wrote: >>>>>>>>>> >>>> >>>>>>>>>> >>>>> Hi there, >>>>>>>>>> >>>>> it seems we have a problem with Jenkins workers again. >>>>>>>>>> Nodes 1 and 7 >>>>>>>>>> >>>>> both fail jobs with "No space left on device". >>>>>>>>>> >>>>> Who is the best person to contact in these cases (someone >>>>>>>>>> with access >>>>>>>>>> >>>>> permissions to the workers). >>>>>>>>>> >>>>> >>>>>>>>>> >>>>> I also noticed that such errors are becoming more and more >>>>>>>>>> frequent >>>>>>>>>> >>>>> recently and I'd like to discuss how can this be remedied. >>>>>>>>>> Can a cleanup >>>>>>>>>> >>>>> task be automated on Jenkins somehow? >>>>>>>>>> >>>>> >>>>>>>>>> >>>>> Regards >>>>>>>>>> >>>>> Michal >>>>>>>>>> >>>>> >>>>>>>>>> >>>>> -- >>>>>>>>>> >>>>> >>>>>>>>>> >>>>> Michał Walenia >>>>>>>>>> >>>>> Polidea <https://www.polidea.com/> | Software Engineer >>>>>>>>>> >>>>> >>>>>>>>>> >>>>> M: +48 791 432 002 <+48%20791%20432%20002> <+48791432002 >>>>>>>>>> <+48%20791%20432%20002>> >>>>>>>>>> >>>>> E: michal.wale...@polidea.com >>>>>>>>>> >>>>> >>>>>>>>>> >>>>> Unique Tech >>>>>>>>>> >>>>> Check out our projects! <https://www.polidea.com/our-work> >>>>>>>>>> >>>>> >>>>>>>>>> >>>> >>>>>>>>>> >> >>>>>>>>>> >>>>>>>>>