Ah I see, thanks Kenn. I found some advice from the Apache infra wiki that
also suggests using a tmpdir inside the workspace [1]:

Procedures Projects can take to clean up disk space

Projects can help themselves and Infra by taking some basic steps to help
clean up their jobs after themselves on the build nodes.



   1. Use a ./tmp dir in your jobs workspace. That way it gets cleaned up
   when job workspaces expire.
   2. Configure your jobs to wipe workspaces on start or finish.
   3. Configure your jobs to only keep 5 or 10 previous builds.
   4. Configure your jobs to only keep 5 or 10 previous artifacts.



[1]:
https://cwiki.apache.org/confluence/display/INFRA/Disk+Space+cleanup+of+Jenkins+nodes

On Wed, Jul 22, 2020 at 8:06 AM Kenneth Knowles <k...@apache.org> wrote:

> Those file listings look like the result of using standard temp file APIs
> but with TMPDIR set to /tmp.
>
> On Mon, Jul 20, 2020 at 7:55 PM Tyson Hamilton <tyso...@google.com> wrote:
>
>> Jobs are hermetic as far as I can tell and use unique subdirectories
>> inside of /tmp. Here is a quick look into two examples:
>>
>> @apache-ci-beam-jenkins-4:/tmp$ sudo du -ah --time . | sort -rhk 1,1 |
>> head -n 20
>> 1.6G    2020-07-21 02:25        .
>> 242M    2020-07-17 18:48        ./beam-pipeline-temp3ybuY4
>> 242M    2020-07-17 18:46        ./beam-pipeline-tempuxjiPT
>> 242M    2020-07-17 18:44        ./beam-pipeline-tempVpg1ME
>> 242M    2020-07-17 18:42        ./beam-pipeline-tempJ4EpyB
>> 242M    2020-07-17 18:39        ./beam-pipeline-tempepea7Q
>> 242M    2020-07-17 18:35        ./beam-pipeline-temp79qot2
>> 236M    2020-07-17 18:48        ./beam-pipeline-temp3ybuY4/tmpy_Ytzz
>> 236M    2020-07-17 18:46        ./beam-pipeline-tempuxjiPT/tmpN5_UfJ
>> 236M    2020-07-17 18:44        ./beam-pipeline-tempVpg1ME/tmpxSm8pX
>> 236M    2020-07-17 18:42        ./beam-pipeline-tempJ4EpyB/tmpMZJU76
>> 236M    2020-07-17 18:39        ./beam-pipeline-tempepea7Q/tmpWy1vWX
>> 236M    2020-07-17 18:35        ./beam-pipeline-temp79qot2/tmpvN7vWA
>> 3.7M    2020-07-17 18:48        ./beam-pipeline-temp3ybuY4/tmprlh_di
>> 3.7M    2020-07-17 18:46        ./beam-pipeline-tempuxjiPT/tmpLmVWfe
>> 3.7M    2020-07-17 18:44        ./beam-pipeline-tempVpg1ME/tmpvrxbY7
>> 3.7M    2020-07-17 18:42        ./beam-pipeline-tempJ4EpyB/tmpLTb6Mj
>> 3.7M    2020-07-17 18:39        ./beam-pipeline-tempepea7Q/tmptYF1v1
>> 3.7M    2020-07-17 18:35        ./beam-pipeline-temp79qot2/tmplfV0Rg
>> 2.7M    2020-07-17 20:10        ./pip-install-q9l227ef
>>
>>
>> @apache-ci-beam-jenkins-11:/tmp$ sudo du -ah --time . | sort -rhk 1,1 |
>> head -n 20
>> 817M    2020-07-21 02:26        .
>> 242M    2020-07-19 12:14        ./beam-pipeline-tempUTXqlM
>> 242M    2020-07-19 12:11        ./beam-pipeline-tempx3Yno3
>> 242M    2020-07-19 12:05        ./beam-pipeline-tempyCrMYq
>> 236M    2020-07-19 12:14        ./beam-pipeline-tempUTXqlM/tmpstXoL0
>> 236M    2020-07-19 12:11        ./beam-pipeline-tempx3Yno3/tmpnnVn65
>> 236M    2020-07-19 12:05        ./beam-pipeline-tempyCrMYq/tmpRF0iNs
>> 3.7M    2020-07-19 12:14        ./beam-pipeline-tempUTXqlM/tmpbJjUAQ
>> 3.7M    2020-07-19 12:11        ./beam-pipeline-tempx3Yno3/tmpsmmzqe
>> 3.7M    2020-07-19 12:05        ./beam-pipeline-tempyCrMYq/tmp5b3ZvY
>> 2.0M    2020-07-19 12:14        ./beam-pipeline-tempUTXqlM/tmpoj3orz
>> 2.0M    2020-07-19 12:11        ./beam-pipeline-tempx3Yno3/tmptng9sZ
>> 2.0M    2020-07-19 12:05        ./beam-pipeline-tempyCrMYq/tmpWp6njc
>> 1.2M    2020-07-19 12:14        ./beam-pipeline-tempUTXqlM/tmphgdj35
>> 1.2M    2020-07-19 12:11        ./beam-pipeline-tempx3Yno3/tmp8ySXpm
>> 1.2M    2020-07-19 12:05        ./beam-pipeline-tempyCrMYq/tmpNVEJ4e
>> 992K    2020-07-12 12:00        ./junit642086915811430564
>> 988K    2020-07-12 12:00        ./junit642086915811430564/beam
>> 984K    2020-07-12 12:00        ./junit642086915811430564/beam/nodes
>> 980K    2020-07-12 12:00        ./junit642086915811430564/beam/nodes/0
>>
>>
>>
>> On Mon, Jul 20, 2020 at 6:46 PM Udi Meiri <eh...@google.com> wrote:
>>
>>> You're right, job workspaces should be hermetic.
>>>
>>>
>>>
>>> On Mon, Jul 20, 2020 at 1:24 PM Kenneth Knowles <k...@apache.org> wrote:
>>>
>>>> I'm probably late to this discussion and missing something, but why are
>>>> we writing to /tmp at all? I would expect TMPDIR to point somewhere inside
>>>> the job directory that will be wiped by Jenkins, and I would expect code to
>>>> always create temp files via APIs that respect this. Is Jenkins not
>>>> cleaning up? Do we not have the ability to set this up? Do we have bugs in
>>>> our code (that we could probably find by setting TMPDIR to somewhere
>>>> not-/tmp and running the tests without write permission to /tmp, etc)
>>>>
>>>> Kenn
>>>>
>>>> On Mon, Jul 20, 2020 at 11:39 AM Ahmet Altay <al...@google.com> wrote:
>>>>
>>>>> Related to workspace directory growth, +Udi Meiri <eh...@google.com> filed
>>>>> a relevant issue previously (
>>>>> https://issues.apache.org/jira/browse/BEAM-9865) for cleaning up
>>>>> workspace directory after successful jobs. Alternatively, we can consider
>>>>> periodically cleaning up the /src directories.
>>>>>
>>>>> I would suggest moving the cron task from internal cron scripts to the
>>>>> inventory job (
>>>>> https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_Inventory.groovy#L51).
>>>>> That way, we can see all the cron jobs as part of the source tree, adjust
>>>>> frequencies and clean up codes with PRs. I do not know how internal cron
>>>>> scripts are created, maintained, and how would they be recreated for new
>>>>> worker instances.
>>>>>
>>>>> /cc +Tyson Hamilton <tyso...@google.com>
>>>>>
>>>>> On Mon, Jul 20, 2020 at 4:50 AM Damian Gadomski <
>>>>> damian.gadom...@polidea.com> wrote:
>>>>>
>>>>>> Hey,
>>>>>>
>>>>>> I've recently created a solution for the growing /tmp directory. Part
>>>>>> of it is the job mentioned by Tyson: *beam_Clean_tmp_directory*.
>>>>>> It's intentionally not triggered by cron and should be a last resort
>>>>>> solution for some strange cases.
>>>>>>
>>>>>> Along with that job, I've also updated every worker with an internal
>>>>>> cron script. It's being executed once a week and deletes all the files 
>>>>>> (and
>>>>>> only files) that were not accessed for at least three days. That's 
>>>>>> designed
>>>>>> to be as safe as possible for the running jobs on the worker (not to 
>>>>>> delete
>>>>>> the files that are still in use), and also to be insensitive to the 
>>>>>> current
>>>>>> workload on the machine. The cleanup will always happen, even if some
>>>>>> long-running/stuck jobs are blocking the machine.
>>>>>>
>>>>>> I also think that currently the "No space left" errors may be a
>>>>>> consequence of growing workspace directory rather than /tmp. I didn't do
>>>>>> any detailed analysis but e.g. currently, on apache-beam-jenkins-7 the
>>>>>> workspace directory size is 158 GB while /tmp is only 16 GB. We should
>>>>>> either guarantee the disk size to hold workspaces for all jobs (because
>>>>>> eventually, every worker will execute each job) or clear also the
>>>>>> workspaces in some way.
>>>>>>
>>>>>> Regards,
>>>>>> Damian
>>>>>>
>>>>>>
>>>>>> On Mon, Jul 20, 2020 at 10:43 AM Maximilian Michels <m...@apache.org>
>>>>>> wrote:
>>>>>>
>>>>>>> +1 for scheduling it via a cron job if it won't lead to test
>>>>>>> failures
>>>>>>> while running. Not a Jenkins expert but maybe there is the notion of
>>>>>>> running exclusively while no other tasks are running?
>>>>>>>
>>>>>>> -Max
>>>>>>>
>>>>>>> On 17.07.20 21:49, Tyson Hamilton wrote:
>>>>>>> > FYI there was a job introduced to do this in Jenkins:
>>>>>>> beam_Clean_tmp_directory
>>>>>>> >
>>>>>>> > Currently it needs to be run manually. I'm seeing some out of disk
>>>>>>> related errors in precommit tests currently, perhaps we should schedule
>>>>>>> this job with cron?
>>>>>>> >
>>>>>>> >
>>>>>>> > On 2020/03/11 19:31:13, Heejong Lee <heej...@google.com> wrote:
>>>>>>> >> Still seeing no space left on device errors on jenkins-7 (for
>>>>>>> example:
>>>>>>> >>
>>>>>>> https://builds.apache.org/job/beam_PreCommit_PythonLint_Commit/2754/
>>>>>>> )
>>>>>>> >>
>>>>>>> >>
>>>>>>> >> On Fri, Mar 6, 2020 at 7:11 PM Alan Myrvold <amyrv...@google.com>
>>>>>>> wrote:
>>>>>>> >>
>>>>>>> >>> Did a one time cleanup of tmp files owned by jenkins older than
>>>>>>> 3 days.
>>>>>>> >>> Agree that we need a longer term solution.
>>>>>>> >>>
>>>>>>> >>> Passing recent tests on all executors except jenkins-12, which
>>>>>>> has not
>>>>>>> >>> scheduled recent builds for the past 13 days. Not scheduling:
>>>>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-12/builds
>>>>>>> >>> <
>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-12/builds&sa=D
>>>>>>> >
>>>>>>> >>> Recent passing builds:
>>>>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-1/builds
>>>>>>> >>> <
>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-1/builds&sa=D
>>>>>>> >
>>>>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-2/builds
>>>>>>> >>> <
>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-2/builds&sa=D
>>>>>>> >
>>>>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-3/builds
>>>>>>> >>> <
>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-3/builds&sa=D
>>>>>>> >
>>>>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-4/builds
>>>>>>> >>> <
>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-4/builds&sa=D
>>>>>>> >
>>>>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-5/builds
>>>>>>> >>> <
>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-5/builds&sa=D
>>>>>>> >
>>>>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-6/builds
>>>>>>> >>> <
>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-6/builds&sa=D
>>>>>>> >
>>>>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-7/builds
>>>>>>> >>> <
>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-7/builds&sa=D
>>>>>>> >
>>>>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-8/builds
>>>>>>> >>> <
>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-8/builds&sa=D
>>>>>>> >
>>>>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-9/builds
>>>>>>> >>> <
>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-9/builds&sa=D
>>>>>>> >
>>>>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-10/builds
>>>>>>> >>> <
>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-10/builds&sa=D
>>>>>>> >
>>>>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-11/builds
>>>>>>> >>> <
>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-11/builds&sa=D
>>>>>>> >
>>>>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-13/builds
>>>>>>> >>> <
>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-13/builds&sa=D
>>>>>>> >
>>>>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-14/builds
>>>>>>> >>> <
>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-14/builds&sa=D
>>>>>>> >
>>>>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-15/builds
>>>>>>> >>> <
>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-15/builds&sa=D
>>>>>>> >
>>>>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-16/builds
>>>>>>> >>> <
>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-16/builds&sa=D
>>>>>>> >
>>>>>>> >>>
>>>>>>> >>> On Fri, Mar 6, 2020 at 11:54 AM Ahmet Altay <al...@google.com>
>>>>>>> wrote:
>>>>>>> >>>
>>>>>>> >>>> +Alan Myrvold <amyrv...@google.com> is doing a one time
>>>>>>> cleanup. I agree
>>>>>>> >>>> that we need to have a solution to automate this task or
>>>>>>> address the root
>>>>>>> >>>> cause of the buildup.
>>>>>>> >>>>
>>>>>>> >>>> On Thu, Mar 5, 2020 at 2:47 AM Michał Walenia <
>>>>>>> michal.wale...@polidea.com>
>>>>>>> >>>> wrote:
>>>>>>> >>>>
>>>>>>> >>>>> Hi there,
>>>>>>> >>>>> it seems we have a problem with Jenkins workers again. Nodes 1
>>>>>>> and 7
>>>>>>> >>>>> both fail jobs with "No space left on device".
>>>>>>> >>>>> Who is the best person to contact in these cases (someone with
>>>>>>> access
>>>>>>> >>>>> permissions to the workers).
>>>>>>> >>>>>
>>>>>>> >>>>> I also noticed that such errors are becoming more and more
>>>>>>> frequent
>>>>>>> >>>>> recently and I'd like to discuss how can this be remedied. Can
>>>>>>> a cleanup
>>>>>>> >>>>> task be automated on Jenkins somehow?
>>>>>>> >>>>>
>>>>>>> >>>>> Regards
>>>>>>> >>>>> Michal
>>>>>>> >>>>>
>>>>>>> >>>>> --
>>>>>>> >>>>>
>>>>>>> >>>>> Michał Walenia
>>>>>>> >>>>> Polidea <https://www.polidea.com/> | Software Engineer
>>>>>>> >>>>>
>>>>>>> >>>>> M: +48 791 432 002 <+48%20791%20432%20002> <+48791432002
>>>>>>> <+48%20791%20432%20002>>
>>>>>>> >>>>> E: michal.wale...@polidea.com
>>>>>>> >>>>>
>>>>>>> >>>>> Unique Tech
>>>>>>> >>>>> Check out our projects! <https://www.polidea.com/our-work>
>>>>>>> >>>>>
>>>>>>> >>>>
>>>>>>> >>
>>>>>>>
>>>>>>

Reply via email to