What about the workspaces, which can take up 175GB in some cases (see
above)?
I'm working on getting them cleaned up automatically:
https://github.com/apache/beam/pull/12326

My opinion is that we would get more mileage out of fixing the jobs that
leave behind files in /tmp and images/containers in Docker.
This would also help keep development machines clean.


On Mon, Jul 27, 2020 at 5:31 PM Tyson Hamilton <tyso...@google.com> wrote:

> Here is a summery of how I understand things,
>
>   - /tmp and /var/lib/docker are the culprit for filling up disks
>   - inventory Jenkins job runs every 12 hours and runs a docker prune to
> clean up images older than 24hr
>   - crontab on each machine cleans up /tmp files older than three days
> weekly
>
> This doesn't seem to be working since we're still running out of disk
> periodically and requiring manual intervention. Knobs and options we have
> available:
>
>   1. increase frequency of deleting files
>   2. decrease the number of days required to delete a file (e.g. older
> than 2 days)
>
> The execution methods we have available are:
>
>   A. cron
>     - pro: runs even if a job gets stuck in Jenkins due to full disk
>     - con: config baked into VM which is tough to update, not discoverable
> or documented well
>   B. inventory job
>     - pro: easy to update, runs every 12h already
>     - con: could get stuck if Jenkins agent runs out of disk or is
> otherwise stuck, tied to all other inventory job frequency
>   C. configure startup scripts for the VMs that set up the cron job
> anytime the VM is restarted
>     - pro: similar to A. and easy to update
>     - con: similar to A.
>
> Between the three I prefer B. because it is consistent with other
> inventory jobs. If it ends up that stuck jobs prohibit scheduling of the
> inventory job often we could further investigate C to avoid having to
> rebuild the VM images repeatedly.
>
> Any objections or comments? If not, we'll go forward with B. and reduce
> the date check from 3 days to 2 days.
>
>
> On 2020/07/24 20:13:29, Ahmet Altay <al...@google.com> wrote:
> > Tests may not be doing docker cleanup. Inventory job runs a docker prune
> > every 12 hours for images older than 24 hrs [1]. Randomly looking at one
> of
> > the recent runs [2], it cleaned up a long list of containers consuming
> > 30+GB space. That should be just 12 hours worth of containers.
> >
> > [1]
> >
> https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_Inventory.groovy#L69
> > [2]
> >
> https://ci-beam.apache.org/job/beam_Inventory_apache-beam-jenkins-14/501/console
> >
> > On Fri, Jul 24, 2020 at 1:07 PM Tyson Hamilton <tyso...@google.com>
> wrote:
> >
> > > Yes, these are on the same volume in the /var/lib/docker directory. I'm
> > > unsure if they clean up leftover images.
> > >
> > > On Fri, Jul 24, 2020 at 12:52 PM Udi Meiri <eh...@google.com> wrote:
> > >
> > >> I forgot Docker images:
> > >>
> > >> ehudm@apache-ci-beam-jenkins-3:~$ sudo docker system df
> > >> TYPE                TOTAL               ACTIVE              SIZE
> > >>        RECLAIMABLE
> > >> Images              88                  9                   125.4GB
> > >>       124.2GB (99%)
> > >> Containers          40                  4                   7.927GB
> > >>       7.871GB (99%)
> > >> Local Volumes       47                  0                   3.165GB
> > >>       3.165GB (100%)
> > >> Build Cache         0                   0                   0B
> > >>        0B
> > >>
> > >> There are about 90 images on that machine, with all but 1 less than 48
> > >> hours old.
> > >> I think the docker test jobs need to try harder at cleaning up their
> > >> leftover images. (assuming they're already doing it?)
> > >>
> > >> On Fri, Jul 24, 2020 at 12:31 PM Udi Meiri <eh...@google.com> wrote:
> > >>
> > >>> The additional slots (@3 directories) take up even more space now
> than
> > >>> before.
> > >>>
> > >>> I'm testing out https://github.com/apache/beam/pull/12326 which
> could
> > >>> help by cleaning up workspaces after a run (just started a seed job).
> > >>>
> > >>> On Fri, Jul 24, 2020 at 12:13 PM Tyson Hamilton <tyso...@google.com>
> > >>> wrote:
> > >>>
> > >>>> 664M    beam_PreCommit_JavaPortabilityApi_Commit
> > >>>> 656M    beam_PreCommit_JavaPortabilityApi_Commit@2
> > >>>> 611M    beam_PreCommit_JavaPortabilityApi_Cron
> > >>>> 616M    beam_PreCommit_JavaPortabilityApiJava11_Commit
> > >>>> 598M    beam_PreCommit_JavaPortabilityApiJava11_Commit@2
> > >>>> 662M    beam_PreCommit_JavaPortabilityApiJava11_Cron
> > >>>> 2.9G    beam_PreCommit_Portable_Python_Commit
> > >>>> 2.9G    beam_PreCommit_Portable_Python_Commit@2
> > >>>> 1.7G    beam_PreCommit_Portable_Python_Commit@3
> > >>>> 3.4G    beam_PreCommit_Portable_Python_Cron
> > >>>> 1.9G    beam_PreCommit_Python2_PVR_Flink_Commit
> > >>>> 1.4G    beam_PreCommit_Python2_PVR_Flink_Cron
> > >>>> 1.3G    beam_PreCommit_Python2_PVR_Flink_Phrase
> > >>>> 6.2G    beam_PreCommit_Python_Commit
> > >>>> 7.5G    beam_PreCommit_Python_Commit@2
> > >>>> 7.5G    beam_PreCommit_Python_Cron
> > >>>> 1012M   beam_PreCommit_PythonDocker_Commit
> > >>>> 1011M   beam_PreCommit_PythonDocker_Commit@2
> > >>>> 1011M   beam_PreCommit_PythonDocker_Commit@3
> > >>>> 1002M   beam_PreCommit_PythonDocker_Cron
> > >>>> 877M    beam_PreCommit_PythonFormatter_Commit
> > >>>> 988M    beam_PreCommit_PythonFormatter_Cron
> > >>>> 986M    beam_PreCommit_PythonFormatter_Phrase
> > >>>> 1.7G    beam_PreCommit_PythonLint_Commit
> > >>>> 2.1G    beam_PreCommit_PythonLint_Cron
> > >>>> 7.5G    beam_PreCommit_Python_Phrase
> > >>>> 346M    beam_PreCommit_RAT_Commit
> > >>>> 341M    beam_PreCommit_RAT_Cron
> > >>>> 338M    beam_PreCommit_Spotless_Commit
> > >>>> 339M    beam_PreCommit_Spotless_Cron
> > >>>> 5.5G    beam_PreCommit_SQL_Commit
> > >>>> 5.5G    beam_PreCommit_SQL_Cron
> > >>>> 5.5G    beam_PreCommit_SQL_Java11_Commit
> > >>>> 750M    beam_PreCommit_Website_Commit
> > >>>> 750M    beam_PreCommit_Website_Commit@2
> > >>>> 750M    beam_PreCommit_Website_Cron
> > >>>> 764M    beam_PreCommit_Website_Stage_GCS_Commit
> > >>>> 771M    beam_PreCommit_Website_Stage_GCS_Cron
> > >>>> 336M    beam_Prober_CommunityMetrics
> > >>>> 693M    beam_python_mongoio_load_test
> > >>>> 339M    beam_SeedJob
> > >>>> 333M    beam_SeedJob_Standalone
> > >>>> 334M    beam_sonarqube_report
> > >>>> 556M    beam_SQLBigQueryIO_Batch_Performance_Test_Java
> > >>>> 175G    total
> > >>>>
> > >>>> On Fri, Jul 24, 2020 at 12:04 PM Tyson Hamilton <tyso...@google.com
> >
> > >>>> wrote:
> > >>>>
> > >>>>> Ya looks like something in the workspaces is taking up room:
> > >>>>>
> > >>>>> @apache-ci-beam-jenkins-8:/home/jenkins$ sudo du -shc .
> > >>>>> 191G    .
> > >>>>> 191G    total
> > >>>>>
> > >>>>>
> > >>>>> On Fri, Jul 24, 2020 at 11:44 AM Tyson Hamilton <
> tyso...@google.com>
> > >>>>> wrote:
> > >>>>>
> > >>>>>> Node 8 is also full. The partition that /tmp is on is here:
> > >>>>>>
> > >>>>>> Filesystem      Size  Used Avail Use% Mounted on
> > >>>>>> /dev/sda1       485G  482G  2.9G 100% /
> > >>>>>>
> > >>>>>> however after cleaning up tmp with the crontab command, there is
> only
> > >>>>>> 8G usage yet it still remains 100% full:
> > >>>>>>
> > >>>>>> @apache-ci-beam-jenkins-8:/tmp$ sudo du -shc /tmp
> > >>>>>> 8.0G    /tmp
> > >>>>>> 8.0G    total
> > >>>>>>
> > >>>>>> The workspaces are in the /home/jenkins/jenkins-slave/workspace
> > >>>>>> directory. When I run a du on that, it takes really long. I'll
> let it keep
> > >>>>>> running for a while to see if it ever returns a result but so far
> this
> > >>>>>> seems suspect.
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>> On Fri, Jul 24, 2020 at 11:19 AM Tyson Hamilton <
> tyso...@google.com>
> > >>>>>> wrote:
> > >>>>>>
> > >>>>>>> Everything I've been looking at is in the /tmp dir. Where are the
> > >>>>>>> workspaces, or what are the named?
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> On Fri, Jul 24, 2020 at 11:03 AM Udi Meiri <eh...@google.com>
> wrote:
> > >>>>>>>
> > >>>>>>>> I'm curious to what you find. Was it /tmp or the workspaces
> using
> > >>>>>>>> up the space?
> > >>>>>>>>
> > >>>>>>>> On Fri, Jul 24, 2020 at 10:57 AM Tyson Hamilton <
> tyso...@google.com>
> > >>>>>>>> wrote:
> > >>>>>>>>
> > >>>>>>>>> Bleck. I just realized that it is 'offline' so that won't work.
> > >>>>>>>>> I'll clean up manually on the machine using the cron command.
> > >>>>>>>>>
> > >>>>>>>>> On Fri, Jul 24, 2020 at 10:56 AM Tyson Hamilton <
> > >>>>>>>>> tyso...@google.com> wrote:
> > >>>>>>>>>
> > >>>>>>>>>> Something isn't working with the current set up because node
> 15
> > >>>>>>>>>> appears to be out of space and is currently 'offline'
> according to Jenkins.
> > >>>>>>>>>> Can someone run the cleanup job? The machine is full,
> > >>>>>>>>>>
> > >>>>>>>>>> @apache-ci-beam-jenkins-15:/tmp$ df -h
> > >>>>>>>>>> Filesystem      Size  Used Avail Use% Mounted on
> > >>>>>>>>>> udev             52G     0   52G   0% /dev
> > >>>>>>>>>> tmpfs            11G  265M   10G   3% /run
> > >>>>>>>>>> */dev/sda1       485G  484G  880M 100% /*
> > >>>>>>>>>> tmpfs            52G     0   52G   0% /dev/shm
> > >>>>>>>>>> tmpfs           5.0M     0  5.0M   0% /run/lock
> > >>>>>>>>>> tmpfs            52G     0   52G   0% /sys/fs/cgroup
> > >>>>>>>>>> tmpfs            11G     0   11G   0% /run/user/1017
> > >>>>>>>>>> tmpfs            11G     0   11G   0% /run/user/1037
> > >>>>>>>>>>
> > >>>>>>>>>> apache-ci-beam-jenkins-15:/tmp$ sudo du -ah --time . | sort
> -rhk
> > >>>>>>>>>> 1,1 | head -n 20
> > >>>>>>>>>> 20G     2020-07-24 17:52        .
> > >>>>>>>>>> 580M    2020-07-22 17:31        ./junit1031982597110125586
> > >>>>>>>>>> 517M    2020-07-22 17:31
> > >>>>>>>>>>
> ./junit1031982597110125586/junit8739924829337821410/heap_dump.hprof
> > >>>>>>>>>> 517M    2020-07-22 17:31
> > >>>>>>>>>>  ./junit1031982597110125586/junit8739924829337821410
> > >>>>>>>>>> 263M    2020-07-22 12:23        ./pip-install-2GUhO_
> > >>>>>>>>>> 263M    2020-07-20 09:30        ./pip-install-sxgwqr
> > >>>>>>>>>> 263M    2020-07-17 13:56        ./pip-install-bWSKIV
> > >>>>>>>>>> 242M    2020-07-21 20:25        ./beam-pipeline-tempmByU6T
> > >>>>>>>>>> 242M    2020-07-21 20:21        ./beam-pipeline-tempV85xeK
> > >>>>>>>>>> 242M    2020-07-21 20:15        ./beam-pipeline-temp7dJROJ
> > >>>>>>>>>> 236M    2020-07-21 20:25
> > >>>>>>>>>>  ./beam-pipeline-tempmByU6T/tmpOWj3Yr
> > >>>>>>>>>> 236M    2020-07-21 20:21
> > >>>>>>>>>>  ./beam-pipeline-tempV85xeK/tmppbQHB3
> > >>>>>>>>>> 236M    2020-07-21 20:15
> > >>>>>>>>>>  ./beam-pipeline-temp7dJROJ/tmpgOXPKW
> > >>>>>>>>>> 111M    2020-07-23 00:57        ./pip-install-1JnyNE
> > >>>>>>>>>> 105M    2020-07-23 00:17
> ./beam-artifact1374651823280819755
> > >>>>>>>>>> 105M    2020-07-23 00:16
> ./beam-artifact5050755582921936972
> > >>>>>>>>>> 105M    2020-07-23 00:16
> ./beam-artifact1834064452502646289
> > >>>>>>>>>> 105M    2020-07-23 00:15
> ./beam-artifact682561790267074916
> > >>>>>>>>>> 105M    2020-07-23 00:15
> ./beam-artifact4691304965824489394
> > >>>>>>>>>> 105M    2020-07-23 00:14
> ./beam-artifact4050383819822604421
> > >>>>>>>>>>
> > >>>>>>>>>> On Wed, Jul 22, 2020 at 12:03 PM Robert Bradshaw <
> > >>>>>>>>>> rober...@google.com> wrote:
> > >>>>>>>>>>
> > >>>>>>>>>>> On Wed, Jul 22, 2020 at 11:57 AM Tyson Hamilton <
> > >>>>>>>>>>> tyso...@google.com> wrote:
> > >>>>>>>>>>>
> > >>>>>>>>>>>> Ah I see, thanks Kenn. I found some advice from the Apache
> > >>>>>>>>>>>> infra wiki that also suggests using a tmpdir inside the
> workspace [1]:
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Procedures Projects can take to clean up disk space
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Projects can help themselves and Infra by taking some basic
> > >>>>>>>>>>>> steps to help clean up their jobs after themselves on the
> build nodes.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>    1. Use a ./tmp dir in your jobs workspace. That way it
> gets
> > >>>>>>>>>>>>    cleaned up when job workspaces expire.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>> Tests should be (able to be) written to use the standard
> > >>>>>>>>>>> temporary file mechanisms, and the environment set up on
> Jenkins such that
> > >>>>>>>>>>> that falls into the respective workspaces. Ideally this
> should be as simple
> > >>>>>>>>>>> as setting the TMPDIR (or similar) environment variable (and
> making sure it
> > >>>>>>>>>>> exists/is writable).
> > >>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>    1. Configure your jobs to wipe workspaces on start or
> > >>>>>>>>>>>>    finish.
> > >>>>>>>>>>>>    2. Configure your jobs to only keep 5 or 10 previous
> builds.
> > >>>>>>>>>>>>    3. Configure your jobs to only keep 5 or 10 previous
> > >>>>>>>>>>>>    artifacts.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> [1]:
> > >>>>>>>>>>>>
> https://cwiki.apache.org/confluence/display/INFRA/Disk+Space+cleanup+of+Jenkins+nodes
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> On Wed, Jul 22, 2020 at 8:06 AM Kenneth Knowles <
> > >>>>>>>>>>>> k...@apache.org> wrote:
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>> Those file listings look like the result of using standard
> > >>>>>>>>>>>>> temp file APIs but with TMPDIR set to /tmp.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> On Mon, Jul 20, 2020 at 7:55 PM Tyson Hamilton <
> > >>>>>>>>>>>>> tyso...@google.com> wrote:
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>> Jobs are hermetic as far as I can tell and use unique
> > >>>>>>>>>>>>>> subdirectories inside of /tmp. Here is a quick look into
> two examples:
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> @apache-ci-beam-jenkins-4:/tmp$ sudo du -ah --time . |
> sort
> > >>>>>>>>>>>>>> -rhk 1,1 | head -n 20
> > >>>>>>>>>>>>>> 1.6G    2020-07-21 02:25        .
> > >>>>>>>>>>>>>> 242M    2020-07-17 18:48        ./beam-pipeline-temp3ybuY4
> > >>>>>>>>>>>>>> 242M    2020-07-17 18:46        ./beam-pipeline-tempuxjiPT
> > >>>>>>>>>>>>>> 242M    2020-07-17 18:44        ./beam-pipeline-tempVpg1ME
> > >>>>>>>>>>>>>> 242M    2020-07-17 18:42        ./beam-pipeline-tempJ4EpyB
> > >>>>>>>>>>>>>> 242M    2020-07-17 18:39        ./beam-pipeline-tempepea7Q
> > >>>>>>>>>>>>>> 242M    2020-07-17 18:35        ./beam-pipeline-temp79qot2
> > >>>>>>>>>>>>>> 236M    2020-07-17 18:48
> > >>>>>>>>>>>>>>  ./beam-pipeline-temp3ybuY4/tmpy_Ytzz
> > >>>>>>>>>>>>>> 236M    2020-07-17 18:46
> > >>>>>>>>>>>>>>  ./beam-pipeline-tempuxjiPT/tmpN5_UfJ
> > >>>>>>>>>>>>>> 236M    2020-07-17 18:44
> > >>>>>>>>>>>>>>  ./beam-pipeline-tempVpg1ME/tmpxSm8pX
> > >>>>>>>>>>>>>> 236M    2020-07-17 18:42
> > >>>>>>>>>>>>>>  ./beam-pipeline-tempJ4EpyB/tmpMZJU76
> > >>>>>>>>>>>>>> 236M    2020-07-17 18:39
> > >>>>>>>>>>>>>>  ./beam-pipeline-tempepea7Q/tmpWy1vWX
> > >>>>>>>>>>>>>> 236M    2020-07-17 18:35
> > >>>>>>>>>>>>>>  ./beam-pipeline-temp79qot2/tmpvN7vWA
> > >>>>>>>>>>>>>> 3.7M    2020-07-17 18:48
> > >>>>>>>>>>>>>>  ./beam-pipeline-temp3ybuY4/tmprlh_di
> > >>>>>>>>>>>>>> 3.7M    2020-07-17 18:46
> > >>>>>>>>>>>>>>  ./beam-pipeline-tempuxjiPT/tmpLmVWfe
> > >>>>>>>>>>>>>> 3.7M    2020-07-17 18:44
> > >>>>>>>>>>>>>>  ./beam-pipeline-tempVpg1ME/tmpvrxbY7
> > >>>>>>>>>>>>>> 3.7M    2020-07-17 18:42
> > >>>>>>>>>>>>>>  ./beam-pipeline-tempJ4EpyB/tmpLTb6Mj
> > >>>>>>>>>>>>>> 3.7M    2020-07-17 18:39
> > >>>>>>>>>>>>>>  ./beam-pipeline-tempepea7Q/tmptYF1v1
> > >>>>>>>>>>>>>> 3.7M    2020-07-17 18:35
> > >>>>>>>>>>>>>>  ./beam-pipeline-temp79qot2/tmplfV0Rg
> > >>>>>>>>>>>>>> 2.7M    2020-07-17 20:10        ./pip-install-q9l227ef
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> @apache-ci-beam-jenkins-11:/tmp$ sudo du -ah --time . |
> sort
> > >>>>>>>>>>>>>> -rhk 1,1 | head -n 20
> > >>>>>>>>>>>>>> 817M    2020-07-21 02:26        .
> > >>>>>>>>>>>>>> 242M    2020-07-19 12:14        ./beam-pipeline-tempUTXqlM
> > >>>>>>>>>>>>>> 242M    2020-07-19 12:11        ./beam-pipeline-tempx3Yno3
> > >>>>>>>>>>>>>> 242M    2020-07-19 12:05        ./beam-pipeline-tempyCrMYq
> > >>>>>>>>>>>>>> 236M    2020-07-19 12:14
> > >>>>>>>>>>>>>>  ./beam-pipeline-tempUTXqlM/tmpstXoL0
> > >>>>>>>>>>>>>> 236M    2020-07-19 12:11
> > >>>>>>>>>>>>>>  ./beam-pipeline-tempx3Yno3/tmpnnVn65
> > >>>>>>>>>>>>>> 236M    2020-07-19 12:05
> > >>>>>>>>>>>>>>  ./beam-pipeline-tempyCrMYq/tmpRF0iNs
> > >>>>>>>>>>>>>> 3.7M    2020-07-19 12:14
> > >>>>>>>>>>>>>>  ./beam-pipeline-tempUTXqlM/tmpbJjUAQ
> > >>>>>>>>>>>>>> 3.7M    2020-07-19 12:11
> > >>>>>>>>>>>>>>  ./beam-pipeline-tempx3Yno3/tmpsmmzqe
> > >>>>>>>>>>>>>> 3.7M    2020-07-19 12:05
> > >>>>>>>>>>>>>>  ./beam-pipeline-tempyCrMYq/tmp5b3ZvY
> > >>>>>>>>>>>>>> 2.0M    2020-07-19 12:14
> > >>>>>>>>>>>>>>  ./beam-pipeline-tempUTXqlM/tmpoj3orz
> > >>>>>>>>>>>>>> 2.0M    2020-07-19 12:11
> > >>>>>>>>>>>>>>  ./beam-pipeline-tempx3Yno3/tmptng9sZ
> > >>>>>>>>>>>>>> 2.0M    2020-07-19 12:05
> > >>>>>>>>>>>>>>  ./beam-pipeline-tempyCrMYq/tmpWp6njc
> > >>>>>>>>>>>>>> 1.2M    2020-07-19 12:14
> > >>>>>>>>>>>>>>  ./beam-pipeline-tempUTXqlM/tmphgdj35
> > >>>>>>>>>>>>>> 1.2M    2020-07-19 12:11
> > >>>>>>>>>>>>>>  ./beam-pipeline-tempx3Yno3/tmp8ySXpm
> > >>>>>>>>>>>>>> 1.2M    2020-07-19 12:05
> > >>>>>>>>>>>>>>  ./beam-pipeline-tempyCrMYq/tmpNVEJ4e
> > >>>>>>>>>>>>>> 992K    2020-07-12 12:00        ./junit642086915811430564
> > >>>>>>>>>>>>>> 988K    2020-07-12 12:00
> ./junit642086915811430564/beam
> > >>>>>>>>>>>>>> 984K    2020-07-12 12:00
> > >>>>>>>>>>>>>>  ./junit642086915811430564/beam/nodes
> > >>>>>>>>>>>>>> 980K    2020-07-12 12:00
> > >>>>>>>>>>>>>>  ./junit642086915811430564/beam/nodes/0
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> On Mon, Jul 20, 2020 at 6:46 PM Udi Meiri <
> eh...@google.com>
> > >>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> You're right, job workspaces should be hermetic.
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> On Mon, Jul 20, 2020 at 1:24 PM Kenneth Knowles <
> > >>>>>>>>>>>>>>> k...@apache.org> wrote:
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> I'm probably late to this discussion and missing
> something,
> > >>>>>>>>>>>>>>>> but why are we writing to /tmp at all? I would expect
> TMPDIR to point
> > >>>>>>>>>>>>>>>> somewhere inside the job directory that will be wiped
> by Jenkins, and I
> > >>>>>>>>>>>>>>>> would expect code to always create temp files via APIs
> that respect this.
> > >>>>>>>>>>>>>>>> Is Jenkins not cleaning up? Do we not have the ability
> to set this up? Do
> > >>>>>>>>>>>>>>>> we have bugs in our code (that we could probably find
> by setting TMPDIR to
> > >>>>>>>>>>>>>>>> somewhere not-/tmp and running the tests without write
> permission to /tmp,
> > >>>>>>>>>>>>>>>> etc)
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> Kenn
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> On Mon, Jul 20, 2020 at 11:39 AM Ahmet Altay <
> > >>>>>>>>>>>>>>>> al...@google.com> wrote:
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> Related to workspace directory growth, +Udi Meiri
> > >>>>>>>>>>>>>>>>> <eh...@google.com> filed a relevant issue previously (
> > >>>>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/BEAM-9865) for
> > >>>>>>>>>>>>>>>>> cleaning up workspace directory after successful jobs.
> Alternatively, we
> > >>>>>>>>>>>>>>>>> can consider periodically cleaning up the /src
> directories.
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> I would suggest moving the cron task from internal cron
> > >>>>>>>>>>>>>>>>> scripts to the inventory job (
> > >>>>>>>>>>>>>>>>>
> https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_Inventory.groovy#L51
> ).
> > >>>>>>>>>>>>>>>>> That way, we can see all the cron jobs as part of the
> source tree, adjust
> > >>>>>>>>>>>>>>>>> frequencies and clean up codes with PRs. I do not know
> how internal cron
> > >>>>>>>>>>>>>>>>> scripts are created, maintained, and how would they be
> recreated for new
> > >>>>>>>>>>>>>>>>> worker instances.
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> /cc +Tyson Hamilton <tyso...@google.com>
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> On Mon, Jul 20, 2020 at 4:50 AM Damian Gadomski <
> > >>>>>>>>>>>>>>>>> damian.gadom...@polidea.com> wrote:
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> Hey,
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> I've recently created a solution for the growing /tmp
> > >>>>>>>>>>>>>>>>>> directory. Part of it is the job mentioned by Tyson:
> > >>>>>>>>>>>>>>>>>> *beam_Clean_tmp_directory*. It's intentionally not
> > >>>>>>>>>>>>>>>>>> triggered by cron and should be a last resort
> solution for some strange
> > >>>>>>>>>>>>>>>>>> cases.
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> Along with that job, I've also updated every worker
> with
> > >>>>>>>>>>>>>>>>>> an internal cron script. It's being executed once a
> week and deletes all
> > >>>>>>>>>>>>>>>>>> the files (and only files) that were not accessed for
> at least three days.
> > >>>>>>>>>>>>>>>>>> That's designed to be as safe as possible for the
> running jobs on the
> > >>>>>>>>>>>>>>>>>> worker (not to delete the files that are still in
> use), and also to be
> > >>>>>>>>>>>>>>>>>> insensitive to the current workload on the machine.
> The cleanup will always
> > >>>>>>>>>>>>>>>>>> happen, even if some long-running/stuck jobs are
> blocking the machine.
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> I also think that currently the "No space left" errors
> > >>>>>>>>>>>>>>>>>> may be a consequence of growing workspace directory
> rather than /tmp. I
> > >>>>>>>>>>>>>>>>>> didn't do any detailed analysis but e.g. currently, on
> > >>>>>>>>>>>>>>>>>> apache-beam-jenkins-7 the workspace directory size is
> 158 GB while /tmp is
> > >>>>>>>>>>>>>>>>>> only 16 GB. We should either guarantee the disk size
> to hold workspaces for
> > >>>>>>>>>>>>>>>>>> all jobs (because eventually, every worker will
> execute each job) or clear
> > >>>>>>>>>>>>>>>>>> also the workspaces in some way.
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> Regards,
> > >>>>>>>>>>>>>>>>>> Damian
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> On Mon, Jul 20, 2020 at 10:43 AM Maximilian Michels <
> > >>>>>>>>>>>>>>>>>> m...@apache.org> wrote:
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> +1 for scheduling it via a cron job if it won't lead
> to
> > >>>>>>>>>>>>>>>>>>> test failures
> > >>>>>>>>>>>>>>>>>>> while running. Not a Jenkins expert but maybe there
> is
> > >>>>>>>>>>>>>>>>>>> the notion of
> > >>>>>>>>>>>>>>>>>>> running exclusively while no other tasks are running?
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> -Max
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> On 17.07.20 21:49, Tyson Hamilton wrote:
> > >>>>>>>>>>>>>>>>>>> > FYI there was a job introduced to do this in
> Jenkins:
> > >>>>>>>>>>>>>>>>>>> beam_Clean_tmp_directory
> > >>>>>>>>>>>>>>>>>>> >
> > >>>>>>>>>>>>>>>>>>> > Currently it needs to be run manually. I'm seeing
> some
> > >>>>>>>>>>>>>>>>>>> out of disk related errors in precommit tests
> currently, perhaps we should
> > >>>>>>>>>>>>>>>>>>> schedule this job with cron?
> > >>>>>>>>>>>>>>>>>>> >
> > >>>>>>>>>>>>>>>>>>> >
> > >>>>>>>>>>>>>>>>>>> > On 2020/03/11 19:31:13, Heejong Lee <
> > >>>>>>>>>>>>>>>>>>> heej...@google.com> wrote:
> > >>>>>>>>>>>>>>>>>>> >> Still seeing no space left on device errors on
> > >>>>>>>>>>>>>>>>>>> jenkins-7 (for example:
> > >>>>>>>>>>>>>>>>>>> >>
> > >>>>>>>>>>>>>>>>>>>
> https://builds.apache.org/job/beam_PreCommit_PythonLint_Commit/2754/
> > >>>>>>>>>>>>>>>>>>> )
> > >>>>>>>>>>>>>>>>>>> >>
> > >>>>>>>>>>>>>>>>>>> >>
> > >>>>>>>>>>>>>>>>>>> >> On Fri, Mar 6, 2020 at 7:11 PM Alan Myrvold <
> > >>>>>>>>>>>>>>>>>>> amyrv...@google.com> wrote:
> > >>>>>>>>>>>>>>>>>>> >>
> > >>>>>>>>>>>>>>>>>>> >>> Did a one time cleanup of tmp files owned by
> jenkins
> > >>>>>>>>>>>>>>>>>>> older than 3 days.
> > >>>>>>>>>>>>>>>>>>> >>> Agree that we need a longer term solution.
> > >>>>>>>>>>>>>>>>>>> >>>
> > >>>>>>>>>>>>>>>>>>> >>> Passing recent tests on all executors except
> > >>>>>>>>>>>>>>>>>>> jenkins-12, which has not
> > >>>>>>>>>>>>>>>>>>> >>> scheduled recent builds for the past 13 days. Not
> > >>>>>>>>>>>>>>>>>>> scheduling:
> > >>>>>>>>>>>>>>>>>>> >>>
> > >>>>>>>>>>>>>>>>>>>
> https://builds.apache.org/computer/apache-beam-jenkins-12/builds
> > >>>>>>>>>>>>>>>>>>> >>> <
> > >>>>>>>>>>>>>>>>>>>
> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-12/builds&sa=D
> > >>>>>>>>>>>>>>>>>>> >
> > >>>>>>>>>>>>>>>>>>> >>> Recent passing builds:
> > >>>>>>>>>>>>>>>>>>> >>>
> > >>>>>>>>>>>>>>>>>>>
> https://builds.apache.org/computer/apache-beam-jenkins-1/builds
> > >>>>>>>>>>>>>>>>>>> >>> <
> > >>>>>>>>>>>>>>>>>>>
> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-1/builds&sa=D
> > >>>>>>>>>>>>>>>>>>> >
> > >>>>>>>>>>>>>>>>>>> >>>
> > >>>>>>>>>>>>>>>>>>>
> https://builds.apache.org/computer/apache-beam-jenkins-2/builds
> > >>>>>>>>>>>>>>>>>>> >>> <
> > >>>>>>>>>>>>>>>>>>>
> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-2/builds&sa=D
> > >>>>>>>>>>>>>>>>>>> >
> > >>>>>>>>>>>>>>>>>>> >>>
> > >>>>>>>>>>>>>>>>>>>
> https://builds.apache.org/computer/apache-beam-jenkins-3/builds
> > >>>>>>>>>>>>>>>>>>> >>> <
> > >>>>>>>>>>>>>>>>>>>
> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-3/builds&sa=D
> > >>>>>>>>>>>>>>>>>>> >
> > >>>>>>>>>>>>>>>>>>> >>>
> > >>>>>>>>>>>>>>>>>>>
> https://builds.apache.org/computer/apache-beam-jenkins-4/builds
> > >>>>>>>>>>>>>>>>>>> >>> <
> > >>>>>>>>>>>>>>>>>>>
> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-4/builds&sa=D
> > >>>>>>>>>>>>>>>>>>> >
> > >>>>>>>>>>>>>>>>>>> >>>
> > >>>>>>>>>>>>>>>>>>>
> https://builds.apache.org/computer/apache-beam-jenkins-5/builds
> > >>>>>>>>>>>>>>>>>>> >>> <
> > >>>>>>>>>>>>>>>>>>>
> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-5/builds&sa=D
> > >>>>>>>>>>>>>>>>>>> >
> > >>>>>>>>>>>>>>>>>>> >>>
> > >>>>>>>>>>>>>>>>>>>
> https://builds.apache.org/computer/apache-beam-jenkins-6/builds
> > >>>>>>>>>>>>>>>>>>> >>> <
> > >>>>>>>>>>>>>>>>>>>
> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-6/builds&sa=D
> > >>>>>>>>>>>>>>>>>>> >
> > >>>>>>>>>>>>>>>>>>> >>>
> > >>>>>>>>>>>>>>>>>>>
> https://builds.apache.org/computer/apache-beam-jenkins-7/builds
> > >>>>>>>>>>>>>>>>>>> >>> <
> > >>>>>>>>>>>>>>>>>>>
> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-7/builds&sa=D
> > >>>>>>>>>>>>>>>>>>> >
> > >>>>>>>>>>>>>>>>>>> >>>
> > >>>>>>>>>>>>>>>>>>>
> https://builds.apache.org/computer/apache-beam-jenkins-8/builds
> > >>>>>>>>>>>>>>>>>>> >>> <
> > >>>>>>>>>>>>>>>>>>>
> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-8/builds&sa=D
> > >>>>>>>>>>>>>>>>>>> >
> > >>>>>>>>>>>>>>>>>>> >>>
> > >>>>>>>>>>>>>>>>>>>
> https://builds.apache.org/computer/apache-beam-jenkins-9/builds
> > >>>>>>>>>>>>>>>>>>> >>> <
> > >>>>>>>>>>>>>>>>>>>
> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-9/builds&sa=D
> > >>>>>>>>>>>>>>>>>>> >
> > >>>>>>>>>>>>>>>>>>> >>>
> > >>>>>>>>>>>>>>>>>>>
> https://builds.apache.org/computer/apache-beam-jenkins-10/builds
> > >>>>>>>>>>>>>>>>>>> >>> <
> > >>>>>>>>>>>>>>>>>>>
> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-10/builds&sa=D
> > >>>>>>>>>>>>>>>>>>> >
> > >>>>>>>>>>>>>>>>>>> >>>
> > >>>>>>>>>>>>>>>>>>>
> https://builds.apache.org/computer/apache-beam-jenkins-11/builds
> > >>>>>>>>>>>>>>>>>>> >>> <
> > >>>>>>>>>>>>>>>>>>>
> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-11/builds&sa=D
> > >>>>>>>>>>>>>>>>>>> >
> > >>>>>>>>>>>>>>>>>>> >>>
> > >>>>>>>>>>>>>>>>>>>
> https://builds.apache.org/computer/apache-beam-jenkins-13/builds
> > >>>>>>>>>>>>>>>>>>> >>> <
> > >>>>>>>>>>>>>>>>>>>
> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-13/builds&sa=D
> > >>>>>>>>>>>>>>>>>>> >
> > >>>>>>>>>>>>>>>>>>> >>>
> > >>>>>>>>>>>>>>>>>>>
> https://builds.apache.org/computer/apache-beam-jenkins-14/builds
> > >>>>>>>>>>>>>>>>>>> >>> <
> > >>>>>>>>>>>>>>>>>>>
> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-14/builds&sa=D
> > >>>>>>>>>>>>>>>>>>> >
> > >>>>>>>>>>>>>>>>>>> >>>
> > >>>>>>>>>>>>>>>>>>>
> https://builds.apache.org/computer/apache-beam-jenkins-15/builds
> > >>>>>>>>>>>>>>>>>>> >>> <
> > >>>>>>>>>>>>>>>>>>>
> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-15/builds&sa=D
> > >>>>>>>>>>>>>>>>>>> >
> > >>>>>>>>>>>>>>>>>>> >>>
> > >>>>>>>>>>>>>>>>>>>
> https://builds.apache.org/computer/apache-beam-jenkins-16/builds
> > >>>>>>>>>>>>>>>>>>> >>> <
> > >>>>>>>>>>>>>>>>>>>
> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-16/builds&sa=D
> > >>>>>>>>>>>>>>>>>>> >
> > >>>>>>>>>>>>>>>>>>> >>>
> > >>>>>>>>>>>>>>>>>>> >>> On Fri, Mar 6, 2020 at 11:54 AM Ahmet Altay <
> > >>>>>>>>>>>>>>>>>>> al...@google.com> wrote:
> > >>>>>>>>>>>>>>>>>>> >>>
> > >>>>>>>>>>>>>>>>>>> >>>> +Alan Myrvold <amyrv...@google.com> is doing a
> one
> > >>>>>>>>>>>>>>>>>>> time cleanup. I agree
> > >>>>>>>>>>>>>>>>>>> >>>> that we need to have a solution to automate this
> > >>>>>>>>>>>>>>>>>>> task or address the root
> > >>>>>>>>>>>>>>>>>>> >>>> cause of the buildup.
> > >>>>>>>>>>>>>>>>>>> >>>>
> > >>>>>>>>>>>>>>>>>>> >>>> On Thu, Mar 5, 2020 at 2:47 AM Michał Walenia <
> > >>>>>>>>>>>>>>>>>>> michal.wale...@polidea.com>
> > >>>>>>>>>>>>>>>>>>> >>>> wrote:
> > >>>>>>>>>>>>>>>>>>> >>>>
> > >>>>>>>>>>>>>>>>>>> >>>>> Hi there,
> > >>>>>>>>>>>>>>>>>>> >>>>> it seems we have a problem with Jenkins workers
> > >>>>>>>>>>>>>>>>>>> again. Nodes 1 and 7
> > >>>>>>>>>>>>>>>>>>> >>>>> both fail jobs with "No space left on device".
> > >>>>>>>>>>>>>>>>>>> >>>>> Who is the best person to contact in these
> cases
> > >>>>>>>>>>>>>>>>>>> (someone with access
> > >>>>>>>>>>>>>>>>>>> >>>>> permissions to the workers).
> > >>>>>>>>>>>>>>>>>>> >>>>>
> > >>>>>>>>>>>>>>>>>>> >>>>> I also noticed that such errors are becoming
> more
> > >>>>>>>>>>>>>>>>>>> and more frequent
> > >>>>>>>>>>>>>>>>>>> >>>>> recently and I'd like to discuss how can this
> be
> > >>>>>>>>>>>>>>>>>>> remedied. Can a cleanup
> > >>>>>>>>>>>>>>>>>>> >>>>> task be automated on Jenkins somehow?
> > >>>>>>>>>>>>>>>>>>> >>>>>
> > >>>>>>>>>>>>>>>>>>> >>>>> Regards
> > >>>>>>>>>>>>>>>>>>> >>>>> Michal
> > >>>>>>>>>>>>>>>>>>> >>>>>
> > >>>>>>>>>>>>>>>>>>> >>>>> --
> > >>>>>>>>>>>>>>>>>>> >>>>>
> > >>>>>>>>>>>>>>>>>>> >>>>> Michał Walenia
> > >>>>>>>>>>>>>>>>>>> >>>>> Polidea <https://www.polidea.com/> | Software
> > >>>>>>>>>>>>>>>>>>> Engineer
> > >>>>>>>>>>>>>>>>>>> >>>>>
> > >>>>>>>>>>>>>>>>>>> >>>>> M: +48 791 432 002 <+48%20791%20432%20002>
> <+48%20791%20432%20002> <
> > >>>>>>>>>>>>>>>>>>> +48791432002 <+48%20791%20432%20002>
> <+48%20791%20432%20002>>
> > >>>>>>>>>>>>>>>>>>> >>>>> E: michal.wale...@polidea.com
> > >>>>>>>>>>>>>>>>>>> >>>>>
> > >>>>>>>>>>>>>>>>>>> >>>>> Unique Tech
> > >>>>>>>>>>>>>>>>>>> >>>>> Check out our projects! <
> > >>>>>>>>>>>>>>>>>>> https://www.polidea.com/our-work>
> > >>>>>>>>>>>>>>>>>>> >>>>>
> > >>>>>>>>>>>>>>>>>>> >>>>
> > >>>>>>>>>>>>>>>>>>> >>
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>
> >
>

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Reply via email to