+1 to the proposal.

Regarding the "(and not guaranteed to work)" part, is the resolution that
the memory issues may still persist and we restore the normal retention
limit (and we look for another fix), or that we never restore back to the
normal retention limit?


Svetak Sundhar

  Technical Solutions Engineer, Data
s <nellywil...@google.com>vetaksund...@google.com



On Tue, Apr 11, 2023 at 10:34 AM Jack McCluskey via dev <dev@beam.apache.org>
wrote:

> +1 for getting Jenkins back into a happier state, getting release blockers
> resolved ahead of building an RC has been severely hindered by Jenkins not
> picking up tests or running them properly.
>
> On Tue, Apr 11, 2023 at 10:24 AM Danny McCormick via dev <
> dev@beam.apache.org> wrote:
>
>> *;tldr - I want to temporarily reduce the number of builds that we retain
>> to reduce pressure on Jenkins*
>>
>> Hey everyone, over the past few days our Jenkins runs have been
>> particularly flaky across the board, with errors like the following showing
>> up all over the place [1]:
>>
>> java.nio.file.FileSystemException: 
>> /home/jenkins/jenkins-home/jobs/beam_PreCommit_Python_Phrase/builds/3352/changelog.xml:
>>  No space left on device [2]
>>
>>
>> These errors indicate that we're out of space on the Jenkins master node.
>> After some digging (thanks @Yi Hu <ya...@google.com> @Ahmet Altay
>> <al...@google.com> and @Bruno Volpato <bvolp...@google.com> for
>> contributing), we've determined that at least one large contributing issue
>> is that some of our builds are eating up too much space. For example, our
>> beam_PreCommit_Java_Commit build is taking up 28GB of space by itself (this
>> is just one example).
>>
>> @Yi Hu <ya...@google.com> found one change around code coverage that is
>> likely heavily contributing to the problem and rolled that back [3]. We can
>> continue to find other contributing factors here.
>>
>> In the meantime, to get us back to healthy *I propose that we reduce the
>> number of builds that we are retaining to 40 for all jobs that are using a
>> large amount of storage (>5GB)*. This will hopefully allow us to return
>> Jenkins to a normal functioning state, though it will do so at the cost of
>> a significant amount of build history (right now, for example,
>> beam_PreCommit_Java_Commit is at 400 retained builds). We could restore the
>> normal retention limit once the underlying problem is resolved. Given that
>> this is irreversible (and not guaranteed to work), I wanted to gather
>> feedback before doing this. Personally, I rarely use builds that old, but
>> others may feel differently.
>>
>> Please let me know if you have any objections or support for this
>> proposal.
>>
>> Thanks,
>> Danny
>>
>> [1] Tracking issue: https://github.com/apache/beam/issues/26197
>> [2] Example run with this error:
>> https://ci-beam.apache.org/job/beam_PreCommit_Python_Phrase/3352/console
>> [3] Rollback PR: https://github.com/apache/beam/pull/26199
>>
>

Reply via email to