+1 Thanks Danny for figuring out a solution. Best, Yi
On Tue, Apr 11, 2023 at 10:56 AM Svetak Sundhar via dev <dev@beam.apache.org> wrote: > +1 to the proposal. > > Regarding the "(and not guaranteed to work)" part, is the resolution that > the memory issues may still persist and we restore the normal retention > limit (and we look for another fix), or that we never restore back to the > normal retention limit? > > > Svetak Sundhar > > Technical Solutions Engineer, Data > s <nellywil...@google.com>vetaksund...@google.com > > > > On Tue, Apr 11, 2023 at 10:34 AM Jack McCluskey via dev < > dev@beam.apache.org> wrote: > >> +1 for getting Jenkins back into a happier state, getting release >> blockers resolved ahead of building an RC has been severely hindered by >> Jenkins not picking up tests or running them properly. >> >> On Tue, Apr 11, 2023 at 10:24 AM Danny McCormick via dev < >> dev@beam.apache.org> wrote: >> >>> *;tldr - I want to temporarily reduce the number of builds that we >>> retain to reduce pressure on Jenkins* >>> >>> Hey everyone, over the past few days our Jenkins runs have been >>> particularly flaky across the board, with errors like the following showing >>> up all over the place [1]: >>> >>> java.nio.file.FileSystemException: >>> /home/jenkins/jenkins-home/jobs/beam_PreCommit_Python_Phrase/builds/3352/changelog.xml: >>> No space left on device [2] >>> >>> >>> These errors indicate that we're out of space on the Jenkins master >>> node. After some digging (thanks @Yi Hu <ya...@google.com> @Ahmet Altay >>> <al...@google.com> and @Bruno Volpato <bvolp...@google.com> for >>> contributing), we've determined that at least one large contributing issue >>> is that some of our builds are eating up too much space. For example, our >>> beam_PreCommit_Java_Commit build is taking up 28GB of space by itself (this >>> is just one example). >>> >>> @Yi Hu <ya...@google.com> found one change around code coverage that is >>> likely heavily contributing to the problem and rolled that back [3]. We can >>> continue to find other contributing factors here. >>> >>> In the meantime, to get us back to healthy *I propose that we reduce >>> the number of builds that we are retaining to 40 for all jobs that are >>> using a large amount of storage (>5GB)*. This will hopefully allow us >>> to return Jenkins to a normal functioning state, though it will do so at >>> the cost of a significant amount of build history (right now, for example, >>> beam_PreCommit_Java_Commit is at 400 retained builds). We could restore the >>> normal retention limit once the underlying problem is resolved. Given that >>> this is irreversible (and not guaranteed to work), I wanted to gather >>> feedback before doing this. Personally, I rarely use builds that old, but >>> others may feel differently. >>> >>> Please let me know if you have any objections or support for this >>> proposal. >>> >>> Thanks, >>> Danny >>> >>> [1] Tracking issue: https://github.com/apache/beam/issues/26197 >>> [2] Example run with this error: >>> https://ci-beam.apache.org/job/beam_PreCommit_Python_Phrase/3352/console >>> [3] Rollback PR: https://github.com/apache/beam/pull/26199 >>> >>