What about the workspaces, which can take up 175GB in some cases (see above)? I'm working on getting them cleaned up automatically: https://github.com/apache/beam/pull/12326
My opinion is that we would get more mileage out of fixing the jobs that leave behind files in /tmp and images/containers in Docker. This would also help keep development machines clean. On Mon, Jul 27, 2020 at 5:31 PM Tyson Hamilton <tyso...@google.com> wrote: > Here is a summery of how I understand things, > > - /tmp and /var/lib/docker are the culprit for filling up disks > - inventory Jenkins job runs every 12 hours and runs a docker prune to > clean up images older than 24hr > - crontab on each machine cleans up /tmp files older than three days > weekly > > This doesn't seem to be working since we're still running out of disk > periodically and requiring manual intervention. Knobs and options we have > available: > > 1. increase frequency of deleting files > 2. decrease the number of days required to delete a file (e.g. older > than 2 days) > > The execution methods we have available are: > > A. cron > - pro: runs even if a job gets stuck in Jenkins due to full disk > - con: config baked into VM which is tough to update, not discoverable > or documented well > B. inventory job > - pro: easy to update, runs every 12h already > - con: could get stuck if Jenkins agent runs out of disk or is > otherwise stuck, tied to all other inventory job frequency > C. configure startup scripts for the VMs that set up the cron job > anytime the VM is restarted > - pro: similar to A. and easy to update > - con: similar to A. > > Between the three I prefer B. because it is consistent with other > inventory jobs. If it ends up that stuck jobs prohibit scheduling of the > inventory job often we could further investigate C to avoid having to > rebuild the VM images repeatedly. > > Any objections or comments? If not, we'll go forward with B. and reduce > the date check from 3 days to 2 days. > > > On 2020/07/24 20:13:29, Ahmet Altay <al...@google.com> wrote: > > Tests may not be doing docker cleanup. Inventory job runs a docker prune > > every 12 hours for images older than 24 hrs [1]. Randomly looking at one > of > > the recent runs [2], it cleaned up a long list of containers consuming > > 30+GB space. That should be just 12 hours worth of containers. > > > > [1] > > > https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_Inventory.groovy#L69 > > [2] > > > https://ci-beam.apache.org/job/beam_Inventory_apache-beam-jenkins-14/501/console > > > > On Fri, Jul 24, 2020 at 1:07 PM Tyson Hamilton <tyso...@google.com> > wrote: > > > > > Yes, these are on the same volume in the /var/lib/docker directory. I'm > > > unsure if they clean up leftover images. > > > > > > On Fri, Jul 24, 2020 at 12:52 PM Udi Meiri <eh...@google.com> wrote: > > > > > >> I forgot Docker images: > > >> > > >> ehudm@apache-ci-beam-jenkins-3:~$ sudo docker system df > > >> TYPE TOTAL ACTIVE SIZE > > >> RECLAIMABLE > > >> Images 88 9 125.4GB > > >> 124.2GB (99%) > > >> Containers 40 4 7.927GB > > >> 7.871GB (99%) > > >> Local Volumes 47 0 3.165GB > > >> 3.165GB (100%) > > >> Build Cache 0 0 0B > > >> 0B > > >> > > >> There are about 90 images on that machine, with all but 1 less than 48 > > >> hours old. > > >> I think the docker test jobs need to try harder at cleaning up their > > >> leftover images. (assuming they're already doing it?) > > >> > > >> On Fri, Jul 24, 2020 at 12:31 PM Udi Meiri <eh...@google.com> wrote: > > >> > > >>> The additional slots (@3 directories) take up even more space now > than > > >>> before. > > >>> > > >>> I'm testing out https://github.com/apache/beam/pull/12326 which > could > > >>> help by cleaning up workspaces after a run (just started a seed job). > > >>> > > >>> On Fri, Jul 24, 2020 at 12:13 PM Tyson Hamilton <tyso...@google.com> > > >>> wrote: > > >>> > > >>>> 664M beam_PreCommit_JavaPortabilityApi_Commit > > >>>> 656M beam_PreCommit_JavaPortabilityApi_Commit@2 > > >>>> 611M beam_PreCommit_JavaPortabilityApi_Cron > > >>>> 616M beam_PreCommit_JavaPortabilityApiJava11_Commit > > >>>> 598M beam_PreCommit_JavaPortabilityApiJava11_Commit@2 > > >>>> 662M beam_PreCommit_JavaPortabilityApiJava11_Cron > > >>>> 2.9G beam_PreCommit_Portable_Python_Commit > > >>>> 2.9G beam_PreCommit_Portable_Python_Commit@2 > > >>>> 1.7G beam_PreCommit_Portable_Python_Commit@3 > > >>>> 3.4G beam_PreCommit_Portable_Python_Cron > > >>>> 1.9G beam_PreCommit_Python2_PVR_Flink_Commit > > >>>> 1.4G beam_PreCommit_Python2_PVR_Flink_Cron > > >>>> 1.3G beam_PreCommit_Python2_PVR_Flink_Phrase > > >>>> 6.2G beam_PreCommit_Python_Commit > > >>>> 7.5G beam_PreCommit_Python_Commit@2 > > >>>> 7.5G beam_PreCommit_Python_Cron > > >>>> 1012M beam_PreCommit_PythonDocker_Commit > > >>>> 1011M beam_PreCommit_PythonDocker_Commit@2 > > >>>> 1011M beam_PreCommit_PythonDocker_Commit@3 > > >>>> 1002M beam_PreCommit_PythonDocker_Cron > > >>>> 877M beam_PreCommit_PythonFormatter_Commit > > >>>> 988M beam_PreCommit_PythonFormatter_Cron > > >>>> 986M beam_PreCommit_PythonFormatter_Phrase > > >>>> 1.7G beam_PreCommit_PythonLint_Commit > > >>>> 2.1G beam_PreCommit_PythonLint_Cron > > >>>> 7.5G beam_PreCommit_Python_Phrase > > >>>> 346M beam_PreCommit_RAT_Commit > > >>>> 341M beam_PreCommit_RAT_Cron > > >>>> 338M beam_PreCommit_Spotless_Commit > > >>>> 339M beam_PreCommit_Spotless_Cron > > >>>> 5.5G beam_PreCommit_SQL_Commit > > >>>> 5.5G beam_PreCommit_SQL_Cron > > >>>> 5.5G beam_PreCommit_SQL_Java11_Commit > > >>>> 750M beam_PreCommit_Website_Commit > > >>>> 750M beam_PreCommit_Website_Commit@2 > > >>>> 750M beam_PreCommit_Website_Cron > > >>>> 764M beam_PreCommit_Website_Stage_GCS_Commit > > >>>> 771M beam_PreCommit_Website_Stage_GCS_Cron > > >>>> 336M beam_Prober_CommunityMetrics > > >>>> 693M beam_python_mongoio_load_test > > >>>> 339M beam_SeedJob > > >>>> 333M beam_SeedJob_Standalone > > >>>> 334M beam_sonarqube_report > > >>>> 556M beam_SQLBigQueryIO_Batch_Performance_Test_Java > > >>>> 175G total > > >>>> > > >>>> On Fri, Jul 24, 2020 at 12:04 PM Tyson Hamilton <tyso...@google.com > > > > >>>> wrote: > > >>>> > > >>>>> Ya looks like something in the workspaces is taking up room: > > >>>>> > > >>>>> @apache-ci-beam-jenkins-8:/home/jenkins$ sudo du -shc . > > >>>>> 191G . > > >>>>> 191G total > > >>>>> > > >>>>> > > >>>>> On Fri, Jul 24, 2020 at 11:44 AM Tyson Hamilton < > tyso...@google.com> > > >>>>> wrote: > > >>>>> > > >>>>>> Node 8 is also full. The partition that /tmp is on is here: > > >>>>>> > > >>>>>> Filesystem Size Used Avail Use% Mounted on > > >>>>>> /dev/sda1 485G 482G 2.9G 100% / > > >>>>>> > > >>>>>> however after cleaning up tmp with the crontab command, there is > only > > >>>>>> 8G usage yet it still remains 100% full: > > >>>>>> > > >>>>>> @apache-ci-beam-jenkins-8:/tmp$ sudo du -shc /tmp > > >>>>>> 8.0G /tmp > > >>>>>> 8.0G total > > >>>>>> > > >>>>>> The workspaces are in the /home/jenkins/jenkins-slave/workspace > > >>>>>> directory. When I run a du on that, it takes really long. I'll > let it keep > > >>>>>> running for a while to see if it ever returns a result but so far > this > > >>>>>> seems suspect. > > >>>>>> > > >>>>>> > > >>>>>> > > >>>>>> > > >>>>>> > > >>>>>> On Fri, Jul 24, 2020 at 11:19 AM Tyson Hamilton < > tyso...@google.com> > > >>>>>> wrote: > > >>>>>> > > >>>>>>> Everything I've been looking at is in the /tmp dir. Where are the > > >>>>>>> workspaces, or what are the named? > > >>>>>>> > > >>>>>>> > > >>>>>>> > > >>>>>>> > > >>>>>>> On Fri, Jul 24, 2020 at 11:03 AM Udi Meiri <eh...@google.com> > wrote: > > >>>>>>> > > >>>>>>>> I'm curious to what you find. Was it /tmp or the workspaces > using > > >>>>>>>> up the space? > > >>>>>>>> > > >>>>>>>> On Fri, Jul 24, 2020 at 10:57 AM Tyson Hamilton < > tyso...@google.com> > > >>>>>>>> wrote: > > >>>>>>>> > > >>>>>>>>> Bleck. I just realized that it is 'offline' so that won't work. > > >>>>>>>>> I'll clean up manually on the machine using the cron command. > > >>>>>>>>> > > >>>>>>>>> On Fri, Jul 24, 2020 at 10:56 AM Tyson Hamilton < > > >>>>>>>>> tyso...@google.com> wrote: > > >>>>>>>>> > > >>>>>>>>>> Something isn't working with the current set up because node > 15 > > >>>>>>>>>> appears to be out of space and is currently 'offline' > according to Jenkins. > > >>>>>>>>>> Can someone run the cleanup job? The machine is full, > > >>>>>>>>>> > > >>>>>>>>>> @apache-ci-beam-jenkins-15:/tmp$ df -h > > >>>>>>>>>> Filesystem Size Used Avail Use% Mounted on > > >>>>>>>>>> udev 52G 0 52G 0% /dev > > >>>>>>>>>> tmpfs 11G 265M 10G 3% /run > > >>>>>>>>>> */dev/sda1 485G 484G 880M 100% /* > > >>>>>>>>>> tmpfs 52G 0 52G 0% /dev/shm > > >>>>>>>>>> tmpfs 5.0M 0 5.0M 0% /run/lock > > >>>>>>>>>> tmpfs 52G 0 52G 0% /sys/fs/cgroup > > >>>>>>>>>> tmpfs 11G 0 11G 0% /run/user/1017 > > >>>>>>>>>> tmpfs 11G 0 11G 0% /run/user/1037 > > >>>>>>>>>> > > >>>>>>>>>> apache-ci-beam-jenkins-15:/tmp$ sudo du -ah --time . | sort > -rhk > > >>>>>>>>>> 1,1 | head -n 20 > > >>>>>>>>>> 20G 2020-07-24 17:52 . > > >>>>>>>>>> 580M 2020-07-22 17:31 ./junit1031982597110125586 > > >>>>>>>>>> 517M 2020-07-22 17:31 > > >>>>>>>>>> > ./junit1031982597110125586/junit8739924829337821410/heap_dump.hprof > > >>>>>>>>>> 517M 2020-07-22 17:31 > > >>>>>>>>>> ./junit1031982597110125586/junit8739924829337821410 > > >>>>>>>>>> 263M 2020-07-22 12:23 ./pip-install-2GUhO_ > > >>>>>>>>>> 263M 2020-07-20 09:30 ./pip-install-sxgwqr > > >>>>>>>>>> 263M 2020-07-17 13:56 ./pip-install-bWSKIV > > >>>>>>>>>> 242M 2020-07-21 20:25 ./beam-pipeline-tempmByU6T > > >>>>>>>>>> 242M 2020-07-21 20:21 ./beam-pipeline-tempV85xeK > > >>>>>>>>>> 242M 2020-07-21 20:15 ./beam-pipeline-temp7dJROJ > > >>>>>>>>>> 236M 2020-07-21 20:25 > > >>>>>>>>>> ./beam-pipeline-tempmByU6T/tmpOWj3Yr > > >>>>>>>>>> 236M 2020-07-21 20:21 > > >>>>>>>>>> ./beam-pipeline-tempV85xeK/tmppbQHB3 > > >>>>>>>>>> 236M 2020-07-21 20:15 > > >>>>>>>>>> ./beam-pipeline-temp7dJROJ/tmpgOXPKW > > >>>>>>>>>> 111M 2020-07-23 00:57 ./pip-install-1JnyNE > > >>>>>>>>>> 105M 2020-07-23 00:17 > ./beam-artifact1374651823280819755 > > >>>>>>>>>> 105M 2020-07-23 00:16 > ./beam-artifact5050755582921936972 > > >>>>>>>>>> 105M 2020-07-23 00:16 > ./beam-artifact1834064452502646289 > > >>>>>>>>>> 105M 2020-07-23 00:15 > ./beam-artifact682561790267074916 > > >>>>>>>>>> 105M 2020-07-23 00:15 > ./beam-artifact4691304965824489394 > > >>>>>>>>>> 105M 2020-07-23 00:14 > ./beam-artifact4050383819822604421 > > >>>>>>>>>> > > >>>>>>>>>> On Wed, Jul 22, 2020 at 12:03 PM Robert Bradshaw < > > >>>>>>>>>> rober...@google.com> wrote: > > >>>>>>>>>> > > >>>>>>>>>>> On Wed, Jul 22, 2020 at 11:57 AM Tyson Hamilton < > > >>>>>>>>>>> tyso...@google.com> wrote: > > >>>>>>>>>>> > > >>>>>>>>>>>> Ah I see, thanks Kenn. I found some advice from the Apache > > >>>>>>>>>>>> infra wiki that also suggests using a tmpdir inside the > workspace [1]: > > >>>>>>>>>>>> > > >>>>>>>>>>>> Procedures Projects can take to clean up disk space > > >>>>>>>>>>>> > > >>>>>>>>>>>> Projects can help themselves and Infra by taking some basic > > >>>>>>>>>>>> steps to help clean up their jobs after themselves on the > build nodes. > > >>>>>>>>>>>> > > >>>>>>>>>>>> > > >>>>>>>>>>>> > > >>>>>>>>>>>> 1. Use a ./tmp dir in your jobs workspace. That way it > gets > > >>>>>>>>>>>> cleaned up when job workspaces expire. > > >>>>>>>>>>>> > > >>>>>>>>>>>> > > >>>>>>>>>>> Tests should be (able to be) written to use the standard > > >>>>>>>>>>> temporary file mechanisms, and the environment set up on > Jenkins such that > > >>>>>>>>>>> that falls into the respective workspaces. Ideally this > should be as simple > > >>>>>>>>>>> as setting the TMPDIR (or similar) environment variable (and > making sure it > > >>>>>>>>>>> exists/is writable). > > >>>>>>>>>>> > > >>>>>>>>>>>> > > >>>>>>>>>>>> 1. Configure your jobs to wipe workspaces on start or > > >>>>>>>>>>>> finish. > > >>>>>>>>>>>> 2. Configure your jobs to only keep 5 or 10 previous > builds. > > >>>>>>>>>>>> 3. Configure your jobs to only keep 5 or 10 previous > > >>>>>>>>>>>> artifacts. > > >>>>>>>>>>>> > > >>>>>>>>>>>> > > >>>>>>>>>>>> > > >>>>>>>>>>>> [1]: > > >>>>>>>>>>>> > https://cwiki.apache.org/confluence/display/INFRA/Disk+Space+cleanup+of+Jenkins+nodes > > >>>>>>>>>>>> > > >>>>>>>>>>>> On Wed, Jul 22, 2020 at 8:06 AM Kenneth Knowles < > > >>>>>>>>>>>> k...@apache.org> wrote: > > >>>>>>>>>>>> > > >>>>>>>>>>>>> Those file listings look like the result of using standard > > >>>>>>>>>>>>> temp file APIs but with TMPDIR set to /tmp. > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> On Mon, Jul 20, 2020 at 7:55 PM Tyson Hamilton < > > >>>>>>>>>>>>> tyso...@google.com> wrote: > > >>>>>>>>>>>>> > > >>>>>>>>>>>>>> Jobs are hermetic as far as I can tell and use unique > > >>>>>>>>>>>>>> subdirectories inside of /tmp. Here is a quick look into > two examples: > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> @apache-ci-beam-jenkins-4:/tmp$ sudo du -ah --time . | > sort > > >>>>>>>>>>>>>> -rhk 1,1 | head -n 20 > > >>>>>>>>>>>>>> 1.6G 2020-07-21 02:25 . > > >>>>>>>>>>>>>> 242M 2020-07-17 18:48 ./beam-pipeline-temp3ybuY4 > > >>>>>>>>>>>>>> 242M 2020-07-17 18:46 ./beam-pipeline-tempuxjiPT > > >>>>>>>>>>>>>> 242M 2020-07-17 18:44 ./beam-pipeline-tempVpg1ME > > >>>>>>>>>>>>>> 242M 2020-07-17 18:42 ./beam-pipeline-tempJ4EpyB > > >>>>>>>>>>>>>> 242M 2020-07-17 18:39 ./beam-pipeline-tempepea7Q > > >>>>>>>>>>>>>> 242M 2020-07-17 18:35 ./beam-pipeline-temp79qot2 > > >>>>>>>>>>>>>> 236M 2020-07-17 18:48 > > >>>>>>>>>>>>>> ./beam-pipeline-temp3ybuY4/tmpy_Ytzz > > >>>>>>>>>>>>>> 236M 2020-07-17 18:46 > > >>>>>>>>>>>>>> ./beam-pipeline-tempuxjiPT/tmpN5_UfJ > > >>>>>>>>>>>>>> 236M 2020-07-17 18:44 > > >>>>>>>>>>>>>> ./beam-pipeline-tempVpg1ME/tmpxSm8pX > > >>>>>>>>>>>>>> 236M 2020-07-17 18:42 > > >>>>>>>>>>>>>> ./beam-pipeline-tempJ4EpyB/tmpMZJU76 > > >>>>>>>>>>>>>> 236M 2020-07-17 18:39 > > >>>>>>>>>>>>>> ./beam-pipeline-tempepea7Q/tmpWy1vWX > > >>>>>>>>>>>>>> 236M 2020-07-17 18:35 > > >>>>>>>>>>>>>> ./beam-pipeline-temp79qot2/tmpvN7vWA > > >>>>>>>>>>>>>> 3.7M 2020-07-17 18:48 > > >>>>>>>>>>>>>> ./beam-pipeline-temp3ybuY4/tmprlh_di > > >>>>>>>>>>>>>> 3.7M 2020-07-17 18:46 > > >>>>>>>>>>>>>> ./beam-pipeline-tempuxjiPT/tmpLmVWfe > > >>>>>>>>>>>>>> 3.7M 2020-07-17 18:44 > > >>>>>>>>>>>>>> ./beam-pipeline-tempVpg1ME/tmpvrxbY7 > > >>>>>>>>>>>>>> 3.7M 2020-07-17 18:42 > > >>>>>>>>>>>>>> ./beam-pipeline-tempJ4EpyB/tmpLTb6Mj > > >>>>>>>>>>>>>> 3.7M 2020-07-17 18:39 > > >>>>>>>>>>>>>> ./beam-pipeline-tempepea7Q/tmptYF1v1 > > >>>>>>>>>>>>>> 3.7M 2020-07-17 18:35 > > >>>>>>>>>>>>>> ./beam-pipeline-temp79qot2/tmplfV0Rg > > >>>>>>>>>>>>>> 2.7M 2020-07-17 20:10 ./pip-install-q9l227ef > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> @apache-ci-beam-jenkins-11:/tmp$ sudo du -ah --time . | > sort > > >>>>>>>>>>>>>> -rhk 1,1 | head -n 20 > > >>>>>>>>>>>>>> 817M 2020-07-21 02:26 . > > >>>>>>>>>>>>>> 242M 2020-07-19 12:14 ./beam-pipeline-tempUTXqlM > > >>>>>>>>>>>>>> 242M 2020-07-19 12:11 ./beam-pipeline-tempx3Yno3 > > >>>>>>>>>>>>>> 242M 2020-07-19 12:05 ./beam-pipeline-tempyCrMYq > > >>>>>>>>>>>>>> 236M 2020-07-19 12:14 > > >>>>>>>>>>>>>> ./beam-pipeline-tempUTXqlM/tmpstXoL0 > > >>>>>>>>>>>>>> 236M 2020-07-19 12:11 > > >>>>>>>>>>>>>> ./beam-pipeline-tempx3Yno3/tmpnnVn65 > > >>>>>>>>>>>>>> 236M 2020-07-19 12:05 > > >>>>>>>>>>>>>> ./beam-pipeline-tempyCrMYq/tmpRF0iNs > > >>>>>>>>>>>>>> 3.7M 2020-07-19 12:14 > > >>>>>>>>>>>>>> ./beam-pipeline-tempUTXqlM/tmpbJjUAQ > > >>>>>>>>>>>>>> 3.7M 2020-07-19 12:11 > > >>>>>>>>>>>>>> ./beam-pipeline-tempx3Yno3/tmpsmmzqe > > >>>>>>>>>>>>>> 3.7M 2020-07-19 12:05 > > >>>>>>>>>>>>>> ./beam-pipeline-tempyCrMYq/tmp5b3ZvY > > >>>>>>>>>>>>>> 2.0M 2020-07-19 12:14 > > >>>>>>>>>>>>>> ./beam-pipeline-tempUTXqlM/tmpoj3orz > > >>>>>>>>>>>>>> 2.0M 2020-07-19 12:11 > > >>>>>>>>>>>>>> ./beam-pipeline-tempx3Yno3/tmptng9sZ > > >>>>>>>>>>>>>> 2.0M 2020-07-19 12:05 > > >>>>>>>>>>>>>> ./beam-pipeline-tempyCrMYq/tmpWp6njc > > >>>>>>>>>>>>>> 1.2M 2020-07-19 12:14 > > >>>>>>>>>>>>>> ./beam-pipeline-tempUTXqlM/tmphgdj35 > > >>>>>>>>>>>>>> 1.2M 2020-07-19 12:11 > > >>>>>>>>>>>>>> ./beam-pipeline-tempx3Yno3/tmp8ySXpm > > >>>>>>>>>>>>>> 1.2M 2020-07-19 12:05 > > >>>>>>>>>>>>>> ./beam-pipeline-tempyCrMYq/tmpNVEJ4e > > >>>>>>>>>>>>>> 992K 2020-07-12 12:00 ./junit642086915811430564 > > >>>>>>>>>>>>>> 988K 2020-07-12 12:00 > ./junit642086915811430564/beam > > >>>>>>>>>>>>>> 984K 2020-07-12 12:00 > > >>>>>>>>>>>>>> ./junit642086915811430564/beam/nodes > > >>>>>>>>>>>>>> 980K 2020-07-12 12:00 > > >>>>>>>>>>>>>> ./junit642086915811430564/beam/nodes/0 > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> On Mon, Jul 20, 2020 at 6:46 PM Udi Meiri < > eh...@google.com> > > >>>>>>>>>>>>>> wrote: > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> You're right, job workspaces should be hermetic. > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> On Mon, Jul 20, 2020 at 1:24 PM Kenneth Knowles < > > >>>>>>>>>>>>>>> k...@apache.org> wrote: > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> I'm probably late to this discussion and missing > something, > > >>>>>>>>>>>>>>>> but why are we writing to /tmp at all? I would expect > TMPDIR to point > > >>>>>>>>>>>>>>>> somewhere inside the job directory that will be wiped > by Jenkins, and I > > >>>>>>>>>>>>>>>> would expect code to always create temp files via APIs > that respect this. > > >>>>>>>>>>>>>>>> Is Jenkins not cleaning up? Do we not have the ability > to set this up? Do > > >>>>>>>>>>>>>>>> we have bugs in our code (that we could probably find > by setting TMPDIR to > > >>>>>>>>>>>>>>>> somewhere not-/tmp and running the tests without write > permission to /tmp, > > >>>>>>>>>>>>>>>> etc) > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> Kenn > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> On Mon, Jul 20, 2020 at 11:39 AM Ahmet Altay < > > >>>>>>>>>>>>>>>> al...@google.com> wrote: > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>> Related to workspace directory growth, +Udi Meiri > > >>>>>>>>>>>>>>>>> <eh...@google.com> filed a relevant issue previously ( > > >>>>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/BEAM-9865) for > > >>>>>>>>>>>>>>>>> cleaning up workspace directory after successful jobs. > Alternatively, we > > >>>>>>>>>>>>>>>>> can consider periodically cleaning up the /src > directories. > > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>> I would suggest moving the cron task from internal cron > > >>>>>>>>>>>>>>>>> scripts to the inventory job ( > > >>>>>>>>>>>>>>>>> > https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_Inventory.groovy#L51 > ). > > >>>>>>>>>>>>>>>>> That way, we can see all the cron jobs as part of the > source tree, adjust > > >>>>>>>>>>>>>>>>> frequencies and clean up codes with PRs. I do not know > how internal cron > > >>>>>>>>>>>>>>>>> scripts are created, maintained, and how would they be > recreated for new > > >>>>>>>>>>>>>>>>> worker instances. > > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>> /cc +Tyson Hamilton <tyso...@google.com> > > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>> On Mon, Jul 20, 2020 at 4:50 AM Damian Gadomski < > > >>>>>>>>>>>>>>>>> damian.gadom...@polidea.com> wrote: > > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> Hey, > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> I've recently created a solution for the growing /tmp > > >>>>>>>>>>>>>>>>>> directory. Part of it is the job mentioned by Tyson: > > >>>>>>>>>>>>>>>>>> *beam_Clean_tmp_directory*. It's intentionally not > > >>>>>>>>>>>>>>>>>> triggered by cron and should be a last resort > solution for some strange > > >>>>>>>>>>>>>>>>>> cases. > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> Along with that job, I've also updated every worker > with > > >>>>>>>>>>>>>>>>>> an internal cron script. It's being executed once a > week and deletes all > > >>>>>>>>>>>>>>>>>> the files (and only files) that were not accessed for > at least three days. > > >>>>>>>>>>>>>>>>>> That's designed to be as safe as possible for the > running jobs on the > > >>>>>>>>>>>>>>>>>> worker (not to delete the files that are still in > use), and also to be > > >>>>>>>>>>>>>>>>>> insensitive to the current workload on the machine. > The cleanup will always > > >>>>>>>>>>>>>>>>>> happen, even if some long-running/stuck jobs are > blocking the machine. > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> I also think that currently the "No space left" errors > > >>>>>>>>>>>>>>>>>> may be a consequence of growing workspace directory > rather than /tmp. I > > >>>>>>>>>>>>>>>>>> didn't do any detailed analysis but e.g. currently, on > > >>>>>>>>>>>>>>>>>> apache-beam-jenkins-7 the workspace directory size is > 158 GB while /tmp is > > >>>>>>>>>>>>>>>>>> only 16 GB. We should either guarantee the disk size > to hold workspaces for > > >>>>>>>>>>>>>>>>>> all jobs (because eventually, every worker will > execute each job) or clear > > >>>>>>>>>>>>>>>>>> also the workspaces in some way. > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> Regards, > > >>>>>>>>>>>>>>>>>> Damian > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> On Mon, Jul 20, 2020 at 10:43 AM Maximilian Michels < > > >>>>>>>>>>>>>>>>>> m...@apache.org> wrote: > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>> +1 for scheduling it via a cron job if it won't lead > to > > >>>>>>>>>>>>>>>>>>> test failures > > >>>>>>>>>>>>>>>>>>> while running. Not a Jenkins expert but maybe there > is > > >>>>>>>>>>>>>>>>>>> the notion of > > >>>>>>>>>>>>>>>>>>> running exclusively while no other tasks are running? > > >>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>> -Max > > >>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>> On 17.07.20 21:49, Tyson Hamilton wrote: > > >>>>>>>>>>>>>>>>>>> > FYI there was a job introduced to do this in > Jenkins: > > >>>>>>>>>>>>>>>>>>> beam_Clean_tmp_directory > > >>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>> > Currently it needs to be run manually. I'm seeing > some > > >>>>>>>>>>>>>>>>>>> out of disk related errors in precommit tests > currently, perhaps we should > > >>>>>>>>>>>>>>>>>>> schedule this job with cron? > > >>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>> > On 2020/03/11 19:31:13, Heejong Lee < > > >>>>>>>>>>>>>>>>>>> heej...@google.com> wrote: > > >>>>>>>>>>>>>>>>>>> >> Still seeing no space left on device errors on > > >>>>>>>>>>>>>>>>>>> jenkins-7 (for example: > > >>>>>>>>>>>>>>>>>>> >> > > >>>>>>>>>>>>>>>>>>> > https://builds.apache.org/job/beam_PreCommit_PythonLint_Commit/2754/ > > >>>>>>>>>>>>>>>>>>> ) > > >>>>>>>>>>>>>>>>>>> >> > > >>>>>>>>>>>>>>>>>>> >> > > >>>>>>>>>>>>>>>>>>> >> On Fri, Mar 6, 2020 at 7:11 PM Alan Myrvold < > > >>>>>>>>>>>>>>>>>>> amyrv...@google.com> wrote: > > >>>>>>>>>>>>>>>>>>> >> > > >>>>>>>>>>>>>>>>>>> >>> Did a one time cleanup of tmp files owned by > jenkins > > >>>>>>>>>>>>>>>>>>> older than 3 days. > > >>>>>>>>>>>>>>>>>>> >>> Agree that we need a longer term solution. > > >>>>>>>>>>>>>>>>>>> >>> > > >>>>>>>>>>>>>>>>>>> >>> Passing recent tests on all executors except > > >>>>>>>>>>>>>>>>>>> jenkins-12, which has not > > >>>>>>>>>>>>>>>>>>> >>> scheduled recent builds for the past 13 days. Not > > >>>>>>>>>>>>>>>>>>> scheduling: > > >>>>>>>>>>>>>>>>>>> >>> > > >>>>>>>>>>>>>>>>>>> > https://builds.apache.org/computer/apache-beam-jenkins-12/builds > > >>>>>>>>>>>>>>>>>>> >>> < > > >>>>>>>>>>>>>>>>>>> > https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-12/builds&sa=D > > >>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>> >>> Recent passing builds: > > >>>>>>>>>>>>>>>>>>> >>> > > >>>>>>>>>>>>>>>>>>> > https://builds.apache.org/computer/apache-beam-jenkins-1/builds > > >>>>>>>>>>>>>>>>>>> >>> < > > >>>>>>>>>>>>>>>>>>> > https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-1/builds&sa=D > > >>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>> >>> > > >>>>>>>>>>>>>>>>>>> > https://builds.apache.org/computer/apache-beam-jenkins-2/builds > > >>>>>>>>>>>>>>>>>>> >>> < > > >>>>>>>>>>>>>>>>>>> > https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-2/builds&sa=D > > >>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>> >>> > > >>>>>>>>>>>>>>>>>>> > https://builds.apache.org/computer/apache-beam-jenkins-3/builds > > >>>>>>>>>>>>>>>>>>> >>> < > > >>>>>>>>>>>>>>>>>>> > https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-3/builds&sa=D > > >>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>> >>> > > >>>>>>>>>>>>>>>>>>> > https://builds.apache.org/computer/apache-beam-jenkins-4/builds > > >>>>>>>>>>>>>>>>>>> >>> < > > >>>>>>>>>>>>>>>>>>> > https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-4/builds&sa=D > > >>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>> >>> > > >>>>>>>>>>>>>>>>>>> > https://builds.apache.org/computer/apache-beam-jenkins-5/builds > > >>>>>>>>>>>>>>>>>>> >>> < > > >>>>>>>>>>>>>>>>>>> > https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-5/builds&sa=D > > >>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>> >>> > > >>>>>>>>>>>>>>>>>>> > https://builds.apache.org/computer/apache-beam-jenkins-6/builds > > >>>>>>>>>>>>>>>>>>> >>> < > > >>>>>>>>>>>>>>>>>>> > https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-6/builds&sa=D > > >>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>> >>> > > >>>>>>>>>>>>>>>>>>> > https://builds.apache.org/computer/apache-beam-jenkins-7/builds > > >>>>>>>>>>>>>>>>>>> >>> < > > >>>>>>>>>>>>>>>>>>> > https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-7/builds&sa=D > > >>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>> >>> > > >>>>>>>>>>>>>>>>>>> > https://builds.apache.org/computer/apache-beam-jenkins-8/builds > > >>>>>>>>>>>>>>>>>>> >>> < > > >>>>>>>>>>>>>>>>>>> > https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-8/builds&sa=D > > >>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>> >>> > > >>>>>>>>>>>>>>>>>>> > https://builds.apache.org/computer/apache-beam-jenkins-9/builds > > >>>>>>>>>>>>>>>>>>> >>> < > > >>>>>>>>>>>>>>>>>>> > https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-9/builds&sa=D > > >>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>> >>> > > >>>>>>>>>>>>>>>>>>> > https://builds.apache.org/computer/apache-beam-jenkins-10/builds > > >>>>>>>>>>>>>>>>>>> >>> < > > >>>>>>>>>>>>>>>>>>> > https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-10/builds&sa=D > > >>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>> >>> > > >>>>>>>>>>>>>>>>>>> > https://builds.apache.org/computer/apache-beam-jenkins-11/builds > > >>>>>>>>>>>>>>>>>>> >>> < > > >>>>>>>>>>>>>>>>>>> > https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-11/builds&sa=D > > >>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>> >>> > > >>>>>>>>>>>>>>>>>>> > https://builds.apache.org/computer/apache-beam-jenkins-13/builds > > >>>>>>>>>>>>>>>>>>> >>> < > > >>>>>>>>>>>>>>>>>>> > https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-13/builds&sa=D > > >>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>> >>> > > >>>>>>>>>>>>>>>>>>> > https://builds.apache.org/computer/apache-beam-jenkins-14/builds > > >>>>>>>>>>>>>>>>>>> >>> < > > >>>>>>>>>>>>>>>>>>> > https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-14/builds&sa=D > > >>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>> >>> > > >>>>>>>>>>>>>>>>>>> > https://builds.apache.org/computer/apache-beam-jenkins-15/builds > > >>>>>>>>>>>>>>>>>>> >>> < > > >>>>>>>>>>>>>>>>>>> > https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-15/builds&sa=D > > >>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>> >>> > > >>>>>>>>>>>>>>>>>>> > https://builds.apache.org/computer/apache-beam-jenkins-16/builds > > >>>>>>>>>>>>>>>>>>> >>> < > > >>>>>>>>>>>>>>>>>>> > https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-16/builds&sa=D > > >>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>> >>> > > >>>>>>>>>>>>>>>>>>> >>> On Fri, Mar 6, 2020 at 11:54 AM Ahmet Altay < > > >>>>>>>>>>>>>>>>>>> al...@google.com> wrote: > > >>>>>>>>>>>>>>>>>>> >>> > > >>>>>>>>>>>>>>>>>>> >>>> +Alan Myrvold <amyrv...@google.com> is doing a > one > > >>>>>>>>>>>>>>>>>>> time cleanup. I agree > > >>>>>>>>>>>>>>>>>>> >>>> that we need to have a solution to automate this > > >>>>>>>>>>>>>>>>>>> task or address the root > > >>>>>>>>>>>>>>>>>>> >>>> cause of the buildup. > > >>>>>>>>>>>>>>>>>>> >>>> > > >>>>>>>>>>>>>>>>>>> >>>> On Thu, Mar 5, 2020 at 2:47 AM Michał Walenia < > > >>>>>>>>>>>>>>>>>>> michal.wale...@polidea.com> > > >>>>>>>>>>>>>>>>>>> >>>> wrote: > > >>>>>>>>>>>>>>>>>>> >>>> > > >>>>>>>>>>>>>>>>>>> >>>>> Hi there, > > >>>>>>>>>>>>>>>>>>> >>>>> it seems we have a problem with Jenkins workers > > >>>>>>>>>>>>>>>>>>> again. Nodes 1 and 7 > > >>>>>>>>>>>>>>>>>>> >>>>> both fail jobs with "No space left on device". > > >>>>>>>>>>>>>>>>>>> >>>>> Who is the best person to contact in these > cases > > >>>>>>>>>>>>>>>>>>> (someone with access > > >>>>>>>>>>>>>>>>>>> >>>>> permissions to the workers). > > >>>>>>>>>>>>>>>>>>> >>>>> > > >>>>>>>>>>>>>>>>>>> >>>>> I also noticed that such errors are becoming > more > > >>>>>>>>>>>>>>>>>>> and more frequent > > >>>>>>>>>>>>>>>>>>> >>>>> recently and I'd like to discuss how can this > be > > >>>>>>>>>>>>>>>>>>> remedied. Can a cleanup > > >>>>>>>>>>>>>>>>>>> >>>>> task be automated on Jenkins somehow? > > >>>>>>>>>>>>>>>>>>> >>>>> > > >>>>>>>>>>>>>>>>>>> >>>>> Regards > > >>>>>>>>>>>>>>>>>>> >>>>> Michal > > >>>>>>>>>>>>>>>>>>> >>>>> > > >>>>>>>>>>>>>>>>>>> >>>>> -- > > >>>>>>>>>>>>>>>>>>> >>>>> > > >>>>>>>>>>>>>>>>>>> >>>>> Michał Walenia > > >>>>>>>>>>>>>>>>>>> >>>>> Polidea <https://www.polidea.com/> | Software > > >>>>>>>>>>>>>>>>>>> Engineer > > >>>>>>>>>>>>>>>>>>> >>>>> > > >>>>>>>>>>>>>>>>>>> >>>>> M: +48 791 432 002 <+48%20791%20432%20002> > <+48%20791%20432%20002> < > > >>>>>>>>>>>>>>>>>>> +48791432002 <+48%20791%20432%20002> > <+48%20791%20432%20002>> > > >>>>>>>>>>>>>>>>>>> >>>>> E: michal.wale...@polidea.com > > >>>>>>>>>>>>>>>>>>> >>>>> > > >>>>>>>>>>>>>>>>>>> >>>>> Unique Tech > > >>>>>>>>>>>>>>>>>>> >>>>> Check out our projects! < > > >>>>>>>>>>>>>>>>>>> https://www.polidea.com/our-work> > > >>>>>>>>>>>>>>>>>>> >>>>> > > >>>>>>>>>>>>>>>>>>> >>>> > > >>>>>>>>>>>>>>>>>>> >> > > >>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> > > >
smime.p7s
Description: S/MIME Cryptographic Signature