Bleck. I just realized that it is 'offline' so that won't work. I'll clean
up manually on the machine using the cron command.

On Fri, Jul 24, 2020 at 10:56 AM Tyson Hamilton <tyso...@google.com> wrote:

> Something isn't working with the current set up because node 15 appears to
> be out of space and is currently 'offline' according to Jenkins. Can
> someone run the cleanup job? The machine is full,
>
> @apache-ci-beam-jenkins-15:/tmp$ df -h
> Filesystem      Size  Used Avail Use% Mounted on
> udev             52G     0   52G   0% /dev
> tmpfs            11G  265M   10G   3% /run
> */dev/sda1       485G  484G  880M 100% /*
> tmpfs            52G     0   52G   0% /dev/shm
> tmpfs           5.0M     0  5.0M   0% /run/lock
> tmpfs            52G     0   52G   0% /sys/fs/cgroup
> tmpfs            11G     0   11G   0% /run/user/1017
> tmpfs            11G     0   11G   0% /run/user/1037
>
> apache-ci-beam-jenkins-15:/tmp$ sudo du -ah --time . | sort -rhk 1,1 |
> head -n 20
> 20G     2020-07-24 17:52        .
> 580M    2020-07-22 17:31        ./junit1031982597110125586
> 517M    2020-07-22 17:31
>  ./junit1031982597110125586/junit8739924829337821410/heap_dump.hprof
> 517M    2020-07-22 17:31
>  ./junit1031982597110125586/junit8739924829337821410
> 263M    2020-07-22 12:23        ./pip-install-2GUhO_
> 263M    2020-07-20 09:30        ./pip-install-sxgwqr
> 263M    2020-07-17 13:56        ./pip-install-bWSKIV
> 242M    2020-07-21 20:25        ./beam-pipeline-tempmByU6T
> 242M    2020-07-21 20:21        ./beam-pipeline-tempV85xeK
> 242M    2020-07-21 20:15        ./beam-pipeline-temp7dJROJ
> 236M    2020-07-21 20:25        ./beam-pipeline-tempmByU6T/tmpOWj3Yr
> 236M    2020-07-21 20:21        ./beam-pipeline-tempV85xeK/tmppbQHB3
> 236M    2020-07-21 20:15        ./beam-pipeline-temp7dJROJ/tmpgOXPKW
> 111M    2020-07-23 00:57        ./pip-install-1JnyNE
> 105M    2020-07-23 00:17        ./beam-artifact1374651823280819755
> 105M    2020-07-23 00:16        ./beam-artifact5050755582921936972
> 105M    2020-07-23 00:16        ./beam-artifact1834064452502646289
> 105M    2020-07-23 00:15        ./beam-artifact682561790267074916
> 105M    2020-07-23 00:15        ./beam-artifact4691304965824489394
> 105M    2020-07-23 00:14        ./beam-artifact4050383819822604421
>
> On Wed, Jul 22, 2020 at 12:03 PM Robert Bradshaw <rober...@google.com>
> wrote:
>
>> On Wed, Jul 22, 2020 at 11:57 AM Tyson Hamilton <tyso...@google.com>
>> wrote:
>>
>>> Ah I see, thanks Kenn. I found some advice from the Apache infra wiki
>>> that also suggests using a tmpdir inside the workspace [1]:
>>>
>>> Procedures Projects can take to clean up disk space
>>>
>>> Projects can help themselves and Infra by taking some basic steps to
>>> help clean up their jobs after themselves on the build nodes.
>>>
>>>
>>>
>>>    1. Use a ./tmp dir in your jobs workspace. That way it gets cleaned
>>>    up when job workspaces expire.
>>>
>>>
>> Tests should be (able to be) written to use the standard temporary file
>> mechanisms, and the environment set up on Jenkins such that that falls into
>> the respective workspaces. Ideally this should be as simple as setting
>> the TMPDIR (or similar) environment variable (and making sure it exists/is
>> writable).
>>
>>>
>>>    1. Configure your jobs to wipe workspaces on start or finish.
>>>    2. Configure your jobs to only keep 5 or 10 previous builds.
>>>    3. Configure your jobs to only keep 5 or 10 previous artifacts.
>>>
>>>
>>>
>>> [1]:
>>> https://cwiki.apache.org/confluence/display/INFRA/Disk+Space+cleanup+of+Jenkins+nodes
>>>
>>> On Wed, Jul 22, 2020 at 8:06 AM Kenneth Knowles <k...@apache.org> wrote:
>>>
>>>> Those file listings look like the result of using standard temp file
>>>> APIs but with TMPDIR set to /tmp.
>>>>
>>>> On Mon, Jul 20, 2020 at 7:55 PM Tyson Hamilton <tyso...@google.com>
>>>> wrote:
>>>>
>>>>> Jobs are hermetic as far as I can tell and use unique subdirectories
>>>>> inside of /tmp. Here is a quick look into two examples:
>>>>>
>>>>> @apache-ci-beam-jenkins-4:/tmp$ sudo du -ah --time . | sort -rhk 1,1 |
>>>>> head -n 20
>>>>> 1.6G    2020-07-21 02:25        .
>>>>> 242M    2020-07-17 18:48        ./beam-pipeline-temp3ybuY4
>>>>> 242M    2020-07-17 18:46        ./beam-pipeline-tempuxjiPT
>>>>> 242M    2020-07-17 18:44        ./beam-pipeline-tempVpg1ME
>>>>> 242M    2020-07-17 18:42        ./beam-pipeline-tempJ4EpyB
>>>>> 242M    2020-07-17 18:39        ./beam-pipeline-tempepea7Q
>>>>> 242M    2020-07-17 18:35        ./beam-pipeline-temp79qot2
>>>>> 236M    2020-07-17 18:48        ./beam-pipeline-temp3ybuY4/tmpy_Ytzz
>>>>> 236M    2020-07-17 18:46        ./beam-pipeline-tempuxjiPT/tmpN5_UfJ
>>>>> 236M    2020-07-17 18:44        ./beam-pipeline-tempVpg1ME/tmpxSm8pX
>>>>> 236M    2020-07-17 18:42        ./beam-pipeline-tempJ4EpyB/tmpMZJU76
>>>>> 236M    2020-07-17 18:39        ./beam-pipeline-tempepea7Q/tmpWy1vWX
>>>>> 236M    2020-07-17 18:35        ./beam-pipeline-temp79qot2/tmpvN7vWA
>>>>> 3.7M    2020-07-17 18:48        ./beam-pipeline-temp3ybuY4/tmprlh_di
>>>>> 3.7M    2020-07-17 18:46        ./beam-pipeline-tempuxjiPT/tmpLmVWfe
>>>>> 3.7M    2020-07-17 18:44        ./beam-pipeline-tempVpg1ME/tmpvrxbY7
>>>>> 3.7M    2020-07-17 18:42        ./beam-pipeline-tempJ4EpyB/tmpLTb6Mj
>>>>> 3.7M    2020-07-17 18:39        ./beam-pipeline-tempepea7Q/tmptYF1v1
>>>>> 3.7M    2020-07-17 18:35        ./beam-pipeline-temp79qot2/tmplfV0Rg
>>>>> 2.7M    2020-07-17 20:10        ./pip-install-q9l227ef
>>>>>
>>>>>
>>>>> @apache-ci-beam-jenkins-11:/tmp$ sudo du -ah --time . | sort -rhk 1,1
>>>>> | head -n 20
>>>>> 817M    2020-07-21 02:26        .
>>>>> 242M    2020-07-19 12:14        ./beam-pipeline-tempUTXqlM
>>>>> 242M    2020-07-19 12:11        ./beam-pipeline-tempx3Yno3
>>>>> 242M    2020-07-19 12:05        ./beam-pipeline-tempyCrMYq
>>>>> 236M    2020-07-19 12:14        ./beam-pipeline-tempUTXqlM/tmpstXoL0
>>>>> 236M    2020-07-19 12:11        ./beam-pipeline-tempx3Yno3/tmpnnVn65
>>>>> 236M    2020-07-19 12:05        ./beam-pipeline-tempyCrMYq/tmpRF0iNs
>>>>> 3.7M    2020-07-19 12:14        ./beam-pipeline-tempUTXqlM/tmpbJjUAQ
>>>>> 3.7M    2020-07-19 12:11        ./beam-pipeline-tempx3Yno3/tmpsmmzqe
>>>>> 3.7M    2020-07-19 12:05        ./beam-pipeline-tempyCrMYq/tmp5b3ZvY
>>>>> 2.0M    2020-07-19 12:14        ./beam-pipeline-tempUTXqlM/tmpoj3orz
>>>>> 2.0M    2020-07-19 12:11        ./beam-pipeline-tempx3Yno3/tmptng9sZ
>>>>> 2.0M    2020-07-19 12:05        ./beam-pipeline-tempyCrMYq/tmpWp6njc
>>>>> 1.2M    2020-07-19 12:14        ./beam-pipeline-tempUTXqlM/tmphgdj35
>>>>> 1.2M    2020-07-19 12:11        ./beam-pipeline-tempx3Yno3/tmp8ySXpm
>>>>> 1.2M    2020-07-19 12:05        ./beam-pipeline-tempyCrMYq/tmpNVEJ4e
>>>>> 992K    2020-07-12 12:00        ./junit642086915811430564
>>>>> 988K    2020-07-12 12:00        ./junit642086915811430564/beam
>>>>> 984K    2020-07-12 12:00        ./junit642086915811430564/beam/nodes
>>>>> 980K    2020-07-12 12:00        ./junit642086915811430564/beam/nodes/0
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Jul 20, 2020 at 6:46 PM Udi Meiri <eh...@google.com> wrote:
>>>>>
>>>>>> You're right, job workspaces should be hermetic.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Mon, Jul 20, 2020 at 1:24 PM Kenneth Knowles <k...@apache.org>
>>>>>> wrote:
>>>>>>
>>>>>>> I'm probably late to this discussion and missing something, but why
>>>>>>> are we writing to /tmp at all? I would expect TMPDIR to point somewhere
>>>>>>> inside the job directory that will be wiped by Jenkins, and I would 
>>>>>>> expect
>>>>>>> code to always create temp files via APIs that respect this. Is Jenkins 
>>>>>>> not
>>>>>>> cleaning up? Do we not have the ability to set this up? Do we have bugs 
>>>>>>> in
>>>>>>> our code (that we could probably find by setting TMPDIR to somewhere
>>>>>>> not-/tmp and running the tests without write permission to /tmp, etc)
>>>>>>>
>>>>>>> Kenn
>>>>>>>
>>>>>>> On Mon, Jul 20, 2020 at 11:39 AM Ahmet Altay <al...@google.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Related to workspace directory growth, +Udi Meiri
>>>>>>>> <eh...@google.com> filed a relevant issue previously (
>>>>>>>> https://issues.apache.org/jira/browse/BEAM-9865) for cleaning up
>>>>>>>> workspace directory after successful jobs. Alternatively, we can 
>>>>>>>> consider
>>>>>>>> periodically cleaning up the /src directories.
>>>>>>>>
>>>>>>>> I would suggest moving the cron task from internal cron scripts to
>>>>>>>> the inventory job (
>>>>>>>> https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_Inventory.groovy#L51).
>>>>>>>> That way, we can see all the cron jobs as part of the source tree, 
>>>>>>>> adjust
>>>>>>>> frequencies and clean up codes with PRs. I do not know how internal 
>>>>>>>> cron
>>>>>>>> scripts are created, maintained, and how would they be recreated for 
>>>>>>>> new
>>>>>>>> worker instances.
>>>>>>>>
>>>>>>>> /cc +Tyson Hamilton <tyso...@google.com>
>>>>>>>>
>>>>>>>> On Mon, Jul 20, 2020 at 4:50 AM Damian Gadomski <
>>>>>>>> damian.gadom...@polidea.com> wrote:
>>>>>>>>
>>>>>>>>> Hey,
>>>>>>>>>
>>>>>>>>> I've recently created a solution for the growing /tmp directory.
>>>>>>>>> Part of it is the job mentioned by Tyson:
>>>>>>>>> *beam_Clean_tmp_directory*. It's intentionally not triggered by
>>>>>>>>> cron and should be a last resort solution for some strange cases.
>>>>>>>>>
>>>>>>>>> Along with that job, I've also updated every worker with an
>>>>>>>>> internal cron script. It's being executed once a week and deletes all 
>>>>>>>>> the
>>>>>>>>> files (and only files) that were not accessed for at least three days.
>>>>>>>>> That's designed to be as safe as possible for the running jobs on the
>>>>>>>>> worker (not to delete the files that are still in use), and also to be
>>>>>>>>> insensitive to the current workload on the machine. The cleanup will 
>>>>>>>>> always
>>>>>>>>> happen, even if some long-running/stuck jobs are blocking the machine.
>>>>>>>>>
>>>>>>>>> I also think that currently the "No space left" errors may be a
>>>>>>>>> consequence of growing workspace directory rather than /tmp. I didn't 
>>>>>>>>> do
>>>>>>>>> any detailed analysis but e.g. currently, on apache-beam-jenkins-7 the
>>>>>>>>> workspace directory size is 158 GB while /tmp is only 16 GB. We should
>>>>>>>>> either guarantee the disk size to hold workspaces for all jobs 
>>>>>>>>> (because
>>>>>>>>> eventually, every worker will execute each job) or clear also the
>>>>>>>>> workspaces in some way.
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Damian
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, Jul 20, 2020 at 10:43 AM Maximilian Michels <
>>>>>>>>> m...@apache.org> wrote:
>>>>>>>>>
>>>>>>>>>> +1 for scheduling it via a cron job if it won't lead to test
>>>>>>>>>> failures
>>>>>>>>>> while running. Not a Jenkins expert but maybe there is the notion
>>>>>>>>>> of
>>>>>>>>>> running exclusively while no other tasks are running?
>>>>>>>>>>
>>>>>>>>>> -Max
>>>>>>>>>>
>>>>>>>>>> On 17.07.20 21:49, Tyson Hamilton wrote:
>>>>>>>>>> > FYI there was a job introduced to do this in Jenkins:
>>>>>>>>>> beam_Clean_tmp_directory
>>>>>>>>>> >
>>>>>>>>>> > Currently it needs to be run manually. I'm seeing some out of
>>>>>>>>>> disk related errors in precommit tests currently, perhaps we should
>>>>>>>>>> schedule this job with cron?
>>>>>>>>>> >
>>>>>>>>>> >
>>>>>>>>>> > On 2020/03/11 19:31:13, Heejong Lee <heej...@google.com> wrote:
>>>>>>>>>> >> Still seeing no space left on device errors on jenkins-7 (for
>>>>>>>>>> example:
>>>>>>>>>> >>
>>>>>>>>>> https://builds.apache.org/job/beam_PreCommit_PythonLint_Commit/2754/
>>>>>>>>>> )
>>>>>>>>>> >>
>>>>>>>>>> >>
>>>>>>>>>> >> On Fri, Mar 6, 2020 at 7:11 PM Alan Myrvold <
>>>>>>>>>> amyrv...@google.com> wrote:
>>>>>>>>>> >>
>>>>>>>>>> >>> Did a one time cleanup of tmp files owned by jenkins older
>>>>>>>>>> than 3 days.
>>>>>>>>>> >>> Agree that we need a longer term solution.
>>>>>>>>>> >>>
>>>>>>>>>> >>> Passing recent tests on all executors except jenkins-12,
>>>>>>>>>> which has not
>>>>>>>>>> >>> scheduled recent builds for the past 13 days. Not scheduling:
>>>>>>>>>> >>>
>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-12/builds
>>>>>>>>>> >>> <
>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-12/builds&sa=D
>>>>>>>>>> >
>>>>>>>>>> >>> Recent passing builds:
>>>>>>>>>> >>>
>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-1/builds
>>>>>>>>>> >>> <
>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-1/builds&sa=D
>>>>>>>>>> >
>>>>>>>>>> >>>
>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-2/builds
>>>>>>>>>> >>> <
>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-2/builds&sa=D
>>>>>>>>>> >
>>>>>>>>>> >>>
>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-3/builds
>>>>>>>>>> >>> <
>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-3/builds&sa=D
>>>>>>>>>> >
>>>>>>>>>> >>>
>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-4/builds
>>>>>>>>>> >>> <
>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-4/builds&sa=D
>>>>>>>>>> >
>>>>>>>>>> >>>
>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-5/builds
>>>>>>>>>> >>> <
>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-5/builds&sa=D
>>>>>>>>>> >
>>>>>>>>>> >>>
>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-6/builds
>>>>>>>>>> >>> <
>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-6/builds&sa=D
>>>>>>>>>> >
>>>>>>>>>> >>>
>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-7/builds
>>>>>>>>>> >>> <
>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-7/builds&sa=D
>>>>>>>>>> >
>>>>>>>>>> >>>
>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-8/builds
>>>>>>>>>> >>> <
>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-8/builds&sa=D
>>>>>>>>>> >
>>>>>>>>>> >>>
>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-9/builds
>>>>>>>>>> >>> <
>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-9/builds&sa=D
>>>>>>>>>> >
>>>>>>>>>> >>>
>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-10/builds
>>>>>>>>>> >>> <
>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-10/builds&sa=D
>>>>>>>>>> >
>>>>>>>>>> >>>
>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-11/builds
>>>>>>>>>> >>> <
>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-11/builds&sa=D
>>>>>>>>>> >
>>>>>>>>>> >>>
>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-13/builds
>>>>>>>>>> >>> <
>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-13/builds&sa=D
>>>>>>>>>> >
>>>>>>>>>> >>>
>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-14/builds
>>>>>>>>>> >>> <
>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-14/builds&sa=D
>>>>>>>>>> >
>>>>>>>>>> >>>
>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-15/builds
>>>>>>>>>> >>> <
>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-15/builds&sa=D
>>>>>>>>>> >
>>>>>>>>>> >>>
>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-16/builds
>>>>>>>>>> >>> <
>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-16/builds&sa=D
>>>>>>>>>> >
>>>>>>>>>> >>>
>>>>>>>>>> >>> On Fri, Mar 6, 2020 at 11:54 AM Ahmet Altay <al...@google.com>
>>>>>>>>>> wrote:
>>>>>>>>>> >>>
>>>>>>>>>> >>>> +Alan Myrvold <amyrv...@google.com> is doing a one time
>>>>>>>>>> cleanup. I agree
>>>>>>>>>> >>>> that we need to have a solution to automate this task or
>>>>>>>>>> address the root
>>>>>>>>>> >>>> cause of the buildup.
>>>>>>>>>> >>>>
>>>>>>>>>> >>>> On Thu, Mar 5, 2020 at 2:47 AM Michał Walenia <
>>>>>>>>>> michal.wale...@polidea.com>
>>>>>>>>>> >>>> wrote:
>>>>>>>>>> >>>>
>>>>>>>>>> >>>>> Hi there,
>>>>>>>>>> >>>>> it seems we have a problem with Jenkins workers again.
>>>>>>>>>> Nodes 1 and 7
>>>>>>>>>> >>>>> both fail jobs with "No space left on device".
>>>>>>>>>> >>>>> Who is the best person to contact in these cases (someone
>>>>>>>>>> with access
>>>>>>>>>> >>>>> permissions to the workers).
>>>>>>>>>> >>>>>
>>>>>>>>>> >>>>> I also noticed that such errors are becoming more and more
>>>>>>>>>> frequent
>>>>>>>>>> >>>>> recently and I'd like to discuss how can this be remedied.
>>>>>>>>>> Can a cleanup
>>>>>>>>>> >>>>> task be automated on Jenkins somehow?
>>>>>>>>>> >>>>>
>>>>>>>>>> >>>>> Regards
>>>>>>>>>> >>>>> Michal
>>>>>>>>>> >>>>>
>>>>>>>>>> >>>>> --
>>>>>>>>>> >>>>>
>>>>>>>>>> >>>>> Michał Walenia
>>>>>>>>>> >>>>> Polidea <https://www.polidea.com/> | Software Engineer
>>>>>>>>>> >>>>>
>>>>>>>>>> >>>>> M: +48 791 432 002 <+48%20791%20432%20002> <+48791432002
>>>>>>>>>> <+48%20791%20432%20002>>
>>>>>>>>>> >>>>> E: michal.wale...@polidea.com
>>>>>>>>>> >>>>>
>>>>>>>>>> >>>>> Unique Tech
>>>>>>>>>> >>>>> Check out our projects! <https://www.polidea.com/our-work>
>>>>>>>>>> >>>>>
>>>>>>>>>> >>>>
>>>>>>>>>> >>
>>>>>>>>>>
>>>>>>>>>

Reply via email to