+1

SGTM

Remember, if an issue is being investigated, a committer can always mark a
build to be retained longer in the Jenkins UI. Just be sure to clean it up
once it's resolved though.

(TBH there may also be some old retained builds like that, but I doubt
there's a good way to see which are still relevant.)

On Tue, Apr 11, 2023, 8:03 AM Yi Hu via dev <dev@beam.apache.org> wrote:

> +1 Thanks Danny for figuring out a solution.
>
> Best,
> Yi
>
> On Tue, Apr 11, 2023 at 10:56 AM Svetak Sundhar via dev <
> dev@beam.apache.org> wrote:
>
>> +1 to the proposal.
>>
>> Regarding the "(and not guaranteed to work)" part, is the resolution that
>> the memory issues may still persist and we restore the normal retention
>> limit (and we look for another fix), or that we never restore back to the
>> normal retention limit?
>>
>>
>> Svetak Sundhar
>>
>>   Technical Solutions Engineer, Data
>> s <nellywil...@google.com>vetaksund...@google.com
>>
>>
>>
>> On Tue, Apr 11, 2023 at 10:34 AM Jack McCluskey via dev <
>> dev@beam.apache.org> wrote:
>>
>>> +1 for getting Jenkins back into a happier state, getting release
>>> blockers resolved ahead of building an RC has been severely hindered by
>>> Jenkins not picking up tests or running them properly.
>>>
>>> On Tue, Apr 11, 2023 at 10:24 AM Danny McCormick via dev <
>>> dev@beam.apache.org> wrote:
>>>
>>>> *;tldr - I want to temporarily reduce the number of builds that we
>>>> retain to reduce pressure on Jenkins*
>>>>
>>>> Hey everyone, over the past few days our Jenkins runs have been
>>>> particularly flaky across the board, with errors like the following showing
>>>> up all over the place [1]:
>>>>
>>>> java.nio.file.FileSystemException: 
>>>> /home/jenkins/jenkins-home/jobs/beam_PreCommit_Python_Phrase/builds/3352/changelog.xml:
>>>>  No space left on device [2]
>>>>
>>>>
>>>> These errors indicate that we're out of space on the Jenkins master
>>>> node. After some digging (thanks @Yi Hu <ya...@google.com> @Ahmet Altay
>>>> <al...@google.com> and @Bruno Volpato <bvolp...@google.com> for
>>>> contributing), we've determined that at least one large contributing issue
>>>> is that some of our builds are eating up too much space. For example, our
>>>> beam_PreCommit_Java_Commit build is taking up 28GB of space by itself (this
>>>> is just one example).
>>>>
>>>> @Yi Hu <ya...@google.com> found one change around code coverage that
>>>> is likely heavily contributing to the problem and rolled that back [3]. We
>>>> can continue to find other contributing factors here.
>>>>
>>>> In the meantime, to get us back to healthy *I propose that we reduce
>>>> the number of builds that we are retaining to 40 for all jobs that are
>>>> using a large amount of storage (>5GB)*. This will hopefully allow us
>>>> to return Jenkins to a normal functioning state, though it will do so at
>>>> the cost of a significant amount of build history (right now, for example,
>>>> beam_PreCommit_Java_Commit is at 400 retained builds). We could restore the
>>>> normal retention limit once the underlying problem is resolved. Given that
>>>> this is irreversible (and not guaranteed to work), I wanted to gather
>>>> feedback before doing this. Personally, I rarely use builds that old, but
>>>> others may feel differently.
>>>>
>>>> Please let me know if you have any objections or support for this
>>>> proposal.
>>>>
>>>> Thanks,
>>>> Danny
>>>>
>>>> [1] Tracking issue: https://github.com/apache/beam/issues/26197
>>>> [2] Example run with this error:
>>>> https://ci-beam.apache.org/job/beam_PreCommit_Python_Phrase/3352/console
>>>> [3] Rollback PR: https://github.com/apache/beam/pull/26199
>>>>
>>>

Reply via email to