I'm probably late to this discussion and missing something, but why are we
writing to /tmp at all? I would expect TMPDIR to point somewhere inside the
job directory that will be wiped by Jenkins, and I would expect code to
always create temp files via APIs that respect this. Is Jenkins not
cleaning up? Do we not have the ability to set this up? Do we have bugs in
our code (that we could probably find by setting TMPDIR to somewhere
not-/tmp and running the tests without write permission to /tmp, etc)

Kenn

On Mon, Jul 20, 2020 at 11:39 AM Ahmet Altay <al...@google.com> wrote:

> Related to workspace directory growth, +Udi Meiri <eh...@google.com> filed
> a relevant issue previously (
> https://issues.apache.org/jira/browse/BEAM-9865) for cleaning up
> workspace directory after successful jobs. Alternatively, we can consider
> periodically cleaning up the /src directories.
>
> I would suggest moving the cron task from internal cron scripts to the
> inventory job (
> https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_Inventory.groovy#L51).
> That way, we can see all the cron jobs as part of the source tree, adjust
> frequencies and clean up codes with PRs. I do not know how internal cron
> scripts are created, maintained, and how would they be recreated for new
> worker instances.
>
> /cc +Tyson Hamilton <tyso...@google.com>
>
> On Mon, Jul 20, 2020 at 4:50 AM Damian Gadomski <
> damian.gadom...@polidea.com> wrote:
>
>> Hey,
>>
>> I've recently created a solution for the growing /tmp directory. Part of
>> it is the job mentioned by Tyson: *beam_Clean_tmp_directory*. It's
>> intentionally not triggered by cron and should be a last resort solution
>> for some strange cases.
>>
>> Along with that job, I've also updated every worker with an internal cron
>> script. It's being executed once a week and deletes all the files (and only
>> files) that were not accessed for at least three days. That's designed to
>> be as safe as possible for the running jobs on the worker (not to delete
>> the files that are still in use), and also to be insensitive to the current
>> workload on the machine. The cleanup will always happen, even if some
>> long-running/stuck jobs are blocking the machine.
>>
>> I also think that currently the "No space left" errors may be a
>> consequence of growing workspace directory rather than /tmp. I didn't do
>> any detailed analysis but e.g. currently, on apache-beam-jenkins-7 the
>> workspace directory size is 158 GB while /tmp is only 16 GB. We should
>> either guarantee the disk size to hold workspaces for all jobs (because
>> eventually, every worker will execute each job) or clear also the
>> workspaces in some way.
>>
>> Regards,
>> Damian
>>
>>
>> On Mon, Jul 20, 2020 at 10:43 AM Maximilian Michels <m...@apache.org>
>> wrote:
>>
>>> +1 for scheduling it via a cron job if it won't lead to test failures
>>> while running. Not a Jenkins expert but maybe there is the notion of
>>> running exclusively while no other tasks are running?
>>>
>>> -Max
>>>
>>> On 17.07.20 21:49, Tyson Hamilton wrote:
>>> > FYI there was a job introduced to do this in Jenkins:
>>> beam_Clean_tmp_directory
>>> >
>>> > Currently it needs to be run manually. I'm seeing some out of disk
>>> related errors in precommit tests currently, perhaps we should schedule
>>> this job with cron?
>>> >
>>> >
>>> > On 2020/03/11 19:31:13, Heejong Lee <heej...@google.com> wrote:
>>> >> Still seeing no space left on device errors on jenkins-7 (for example:
>>> >> https://builds.apache.org/job/beam_PreCommit_PythonLint_Commit/2754/)
>>> >>
>>> >>
>>> >> On Fri, Mar 6, 2020 at 7:11 PM Alan Myrvold <amyrv...@google.com>
>>> wrote:
>>> >>
>>> >>> Did a one time cleanup of tmp files owned by jenkins older than 3
>>> days.
>>> >>> Agree that we need a longer term solution.
>>> >>>
>>> >>> Passing recent tests on all executors except jenkins-12, which has
>>> not
>>> >>> scheduled recent builds for the past 13 days. Not scheduling:
>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-12/builds
>>> >>> <
>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-12/builds&sa=D
>>> >
>>> >>> Recent passing builds:
>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-1/builds
>>> >>> <
>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-1/builds&sa=D
>>> >
>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-2/builds
>>> >>> <
>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-2/builds&sa=D
>>> >
>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-3/builds
>>> >>> <
>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-3/builds&sa=D
>>> >
>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-4/builds
>>> >>> <
>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-4/builds&sa=D
>>> >
>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-5/builds
>>> >>> <
>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-5/builds&sa=D
>>> >
>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-6/builds
>>> >>> <
>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-6/builds&sa=D
>>> >
>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-7/builds
>>> >>> <
>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-7/builds&sa=D
>>> >
>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-8/builds
>>> >>> <
>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-8/builds&sa=D
>>> >
>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-9/builds
>>> >>> <
>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-9/builds&sa=D
>>> >
>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-10/builds
>>> >>> <
>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-10/builds&sa=D
>>> >
>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-11/builds
>>> >>> <
>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-11/builds&sa=D
>>> >
>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-13/builds
>>> >>> <
>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-13/builds&sa=D
>>> >
>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-14/builds
>>> >>> <
>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-14/builds&sa=D
>>> >
>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-15/builds
>>> >>> <
>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-15/builds&sa=D
>>> >
>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-16/builds
>>> >>> <
>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-16/builds&sa=D
>>> >
>>> >>>
>>> >>> On Fri, Mar 6, 2020 at 11:54 AM Ahmet Altay <al...@google.com>
>>> wrote:
>>> >>>
>>> >>>> +Alan Myrvold <amyrv...@google.com> is doing a one time cleanup. I
>>> agree
>>> >>>> that we need to have a solution to automate this task or address
>>> the root
>>> >>>> cause of the buildup.
>>> >>>>
>>> >>>> On Thu, Mar 5, 2020 at 2:47 AM Michał Walenia <
>>> michal.wale...@polidea.com>
>>> >>>> wrote:
>>> >>>>
>>> >>>>> Hi there,
>>> >>>>> it seems we have a problem with Jenkins workers again. Nodes 1 and
>>> 7
>>> >>>>> both fail jobs with "No space left on device".
>>> >>>>> Who is the best person to contact in these cases (someone with
>>> access
>>> >>>>> permissions to the workers).
>>> >>>>>
>>> >>>>> I also noticed that such errors are becoming more and more frequent
>>> >>>>> recently and I'd like to discuss how can this be remedied. Can a
>>> cleanup
>>> >>>>> task be automated on Jenkins somehow?
>>> >>>>>
>>> >>>>> Regards
>>> >>>>> Michal
>>> >>>>>
>>> >>>>> --
>>> >>>>>
>>> >>>>> Michał Walenia
>>> >>>>> Polidea <https://www.polidea.com/> | Software Engineer
>>> >>>>>
>>> >>>>> M: +48 791 432 002 <+48%20791%20432%20002> <+48791432002
>>> <+48%20791%20432%20002>>
>>> >>>>> E: michal.wale...@polidea.com
>>> >>>>>
>>> >>>>> Unique Tech
>>> >>>>> Check out our projects! <https://www.polidea.com/our-work>
>>> >>>>>
>>> >>>>
>>> >>
>>>
>>

Reply via email to