Thanks, all,

This proposal sounds good to me.

We could try to just disable the current flaky tests as a one-time step and 
configure GitHub only to merge green builds as a bulwark against the future.

By definition, flaky tests may not fail during the buggy PR build itself, but 
they should make themselves known soon afterwards, in the nightly and 
subsequent PRs. As long as committers are empowered to identify and revert 
flakiness-inducing PRs, they should be able to unblock their subsequent PRs. 

In other words, I’m biased to think that new flakiness indicates 
non-deterministic bugs more often than it indicates a bad test.

But whatever we do, it’s better than merging red builds. 

Thanks,
John

On Sat, Nov 11, 2023, at 10:48, Ismael Juma wrote:
> One more thing:
>
> 3. If test flakiness is introduced by a recent PR, it's appropriate to
> revert said PR vs disabling the flaky tests.
>
> Ismael
>
> On Sat, Nov 11, 2023, 8:45 AM Ismael Juma <m...@ismaeljuma.com> wrote:
>
>> Hi David,
>>
>> I would be fine with:
>> 1. Only allowing merges if the build is green
>> 2. Disabling all flaky tests that aren't fixed within a week. That is, if
>> a test is flaky for more than a week, it should be automatically disabled
>> (it doesn't add any value since it gets ignored).
>>
>> We need both to make this work, if you just do step 1, then we will be
>> stuck with no ability to merge anything.
>>
>> Ismael
>>
>>
>>
>> On Sat, Nov 11, 2023, 2:02 AM David Jacot <dja...@confluent.io.invalid>
>> wrote:
>>
>>> Hi all,
>>>
>>> The state of our CI worries me a lot. Just this week, we merged two PRs
>>> with compilation errors and one PR introducing persistent failures. This
>>> really hurts the quality and the velocity of the project and it basically
>>> defeats the purpose of having a CI because we tend to ignore it nowadays.
>>>
>>> Should we continue to merge without a green build? No! We should not so I
>>> propose to prevent merging a pull request without a green build. This is a
>>> really simple and bold move that will prevent us from introducing
>>> regressions and will improve the overall health of the project. At the
>>> same
>>> time, I think that we should disable all the known flaky tests, raise
>>> jiras
>>> for them, find an owner for each of them, and fix them.
>>>
>>> What do you think?
>>>
>>> Best,
>>> David
>>>
>>

Reply via email to