+1 Thanks Danny for figuring out a solution.

Best,
Yi

On Tue, Apr 11, 2023 at 10:56 AM Svetak Sundhar via dev <dev@beam.apache.org>
wrote:

> +1 to the proposal.
>
> Regarding the "(and not guaranteed to work)" part, is the resolution that
> the memory issues may still persist and we restore the normal retention
> limit (and we look for another fix), or that we never restore back to the
> normal retention limit?
>
>
> Svetak Sundhar
>
>   Technical Solutions Engineer, Data
> s <nellywil...@google.com>vetaksund...@google.com
>
>
>
> On Tue, Apr 11, 2023 at 10:34 AM Jack McCluskey via dev <
> dev@beam.apache.org> wrote:
>
>> +1 for getting Jenkins back into a happier state, getting release
>> blockers resolved ahead of building an RC has been severely hindered by
>> Jenkins not picking up tests or running them properly.
>>
>> On Tue, Apr 11, 2023 at 10:24 AM Danny McCormick via dev <
>> dev@beam.apache.org> wrote:
>>
>>> *;tldr - I want to temporarily reduce the number of builds that we
>>> retain to reduce pressure on Jenkins*
>>>
>>> Hey everyone, over the past few days our Jenkins runs have been
>>> particularly flaky across the board, with errors like the following showing
>>> up all over the place [1]:
>>>
>>> java.nio.file.FileSystemException: 
>>> /home/jenkins/jenkins-home/jobs/beam_PreCommit_Python_Phrase/builds/3352/changelog.xml:
>>>  No space left on device [2]
>>>
>>>
>>> These errors indicate that we're out of space on the Jenkins master
>>> node. After some digging (thanks @Yi Hu <ya...@google.com> @Ahmet Altay
>>> <al...@google.com> and @Bruno Volpato <bvolp...@google.com> for
>>> contributing), we've determined that at least one large contributing issue
>>> is that some of our builds are eating up too much space. For example, our
>>> beam_PreCommit_Java_Commit build is taking up 28GB of space by itself (this
>>> is just one example).
>>>
>>> @Yi Hu <ya...@google.com> found one change around code coverage that is
>>> likely heavily contributing to the problem and rolled that back [3]. We can
>>> continue to find other contributing factors here.
>>>
>>> In the meantime, to get us back to healthy *I propose that we reduce
>>> the number of builds that we are retaining to 40 for all jobs that are
>>> using a large amount of storage (>5GB)*. This will hopefully allow us
>>> to return Jenkins to a normal functioning state, though it will do so at
>>> the cost of a significant amount of build history (right now, for example,
>>> beam_PreCommit_Java_Commit is at 400 retained builds). We could restore the
>>> normal retention limit once the underlying problem is resolved. Given that
>>> this is irreversible (and not guaranteed to work), I wanted to gather
>>> feedback before doing this. Personally, I rarely use builds that old, but
>>> others may feel differently.
>>>
>>> Please let me know if you have any objections or support for this
>>> proposal.
>>>
>>> Thanks,
>>> Danny
>>>
>>> [1] Tracking issue: https://github.com/apache/beam/issues/26197
>>> [2] Example run with this error:
>>> https://ci-beam.apache.org/job/beam_PreCommit_Python_Phrase/3352/console
>>> [3] Rollback PR: https://github.com/apache/beam/pull/26199
>>>
>>

Reply via email to