Re: [DISCUSSION] Flaky tests

David Capwell Thu, 28 May 2020 14:26:15 -0700

> - No flaky tests according to Jenkins or CircleCI? Also, some people run
> the free tier, others take advantage of premium CircleCI. What should be
> the framework?

It would be good to have a common understanding of this; my current mental 
model is

1) Jenkins
2) Circle CI Free tear unit tests (including in-jvm dtests)
3) Circle CI paid tear python dtest

> - "ignored in exceptional cases" - examples?

I personally don’t classify a test as flaky if the CI environment is at fault, 
simple example could be bad disk causing tests to fail.  In this example, 
actions should be taken to fix the CI environment, but if the tests pass in 
another environment I am fine moving on and not blocking a release.

> I got the impression that canonical suite (in this case Jenkins) might be the 
> right direction to follow.

I agree that Jenkins must be a source of input, but don’t think it should be 
the only one at this moment; currently Circle CI produces more builds of 
Cassandra than Jenkins, so ignoring tests failures there causes a more unstable 
environment for development and hide the fact that Jenkins will also see the 
issue.  There are also gaps with Jenkins coverage which hide things such as 
lack of java 11 support and that tests fail more often on java 11.

> But also, sometimes I feel in many cases CircleCI could provide input worth 
> tracking but less likely to be product flakes

Since Circle CI runs more builds than Jenkins, we are more likely to see flaky 
tests there than Jenkins.

> Not to mention flaky tests on Mac running with two cores... Yes, this is 
> sometimes the only way to reproduce some of the reported tests' issues...

I am not aware of anyone opening JIRAs based off this, only using this method 
to reproduce issues found in CI.  I started using this method to help quickly 
reproduce race condition bugs found in CI such as nodetool reporting repairs as 
success when they were actually failed, and one case you are working on where 
preview repair conflicts with a non-committed IR participant even though we 
reported commit to users (both cases are valid bugs found in CI).

> So my idea was to suggest to start tracking an exact Jenkins report maybe

Better visibility is great!  Mick has been setup Slack/Email notifications, but 
maybe a summary in the 4.0 report would be great to enhance visibility to all?

> checked but potentially to be able to leave it for Beta in case we don't feel 
> it shows a product defect

Based off 
https://cwiki.apache.org/confluence/display/CASSANDRA/Release+Lifecycle 
<https://cwiki.apache.org/confluence/display/CASSANDRA/Release+Lifecycle> flaky 
tests block beta releases, so need to happen before then.  What do you mean by 
“leave it for Beta”? Right now we label alpha but don’t block alpha releases on 
flaky tests, given this I don’t follow this statement, could you explain more?

>> At least for me, what I learned in the past is we'd drive to a green test
>> board and immediately transition it as a milestone, so flaky tests would
>> reappear like a disappointing game of whack-a-mole. They seem frustratingly
>> ever-present.

How I see the document, all definitions/expectations from previous phases hold 
true for later stages. Right now the document says we can not cut beta1 until 
flaky tests are resolved, but this would also mean beta2+, rc+, etc; how I 
internalize this is that pre-beta1+, flaky tests are not allowed, so we don’t 
immediately transition away from this.

One trend I have noticed in Cassandra is a lack of trust in tests caused by the 
fact that unrelated failing builds are common; what then happens is the 
author/reviewer ignore the new failing test, write it off as a flaky test, 
commit, and cause more tests to fail. Since testing can be skipped pre-commit, 
and failing tests can be ignored, it puts us in a state that new regressions 
pop up after commit; by having the flaky-tests as a guard against release it 
causes a forcing function to stay stable as long as possible.

>> Default posture to label fix version as beta

Can you explain what you mean by this? Currently we don’t block alpha releases 
on flaky tests even though they are marked alpha, are you proposing we don’t 
block beta releases on flaky tests or are you suggesting we label them beta to 
better match the doc and keep them beta release blockers?

>>> Also, I agree with Mick that it’s good to have a plan and opened Jira 
>>> tickets earlier than later.

+1

> On May 28, 2020, at 10:02 AM, Joshua McKenzie <jmcken...@apache.org> wrote:
> 
> Good point Jordan re: flaky test being either implying API instability or
> blocker to ability to beta test.
> 
> 
> On Thu, May 28, 2020 at 12:56 PM Jordan West <jw...@apache.org> wrote:
> 
>>> On Wed, May 27, 2020 at 5:13 PM Ekaterina Dimitrova <
>>> ekaterina.dimitr...@datastax.com> wrote:
>> 
>>> - No flaky tests according to Jenkins or CircleCI? Also, some people run
>>>> the free tier, others take advantage of premium CircleCI. What should
>> be
>>>> the framework?
>> 
>> 
>> While I agree that we should use the Apache infrastructure as the canonical
>> infrastructure, failures in both (or any) environment matter when it comes
>> to flaky tests.
>> 
>> On Wed, May 27, 2020 at 5:23 PM Joshua McKenzie <jmcken...@apache.org>
>> wrote:
>> 
>>> 
>>> At least for me, what I learned in the past is we'd drive to a green test
>>> board and immediately transition it as a milestone, so flaky tests would
>>> reappear like a disappointing game of whack-a-mole. They seem
>> frustratingly
>>> ever-present.
>>> 
>>> 
>> Agreed. Having multiple successive green runs would be a better bar than
>> one on a single platform imo.
>> 
>> 
>>> I'd personally advocate for us taking the following stance on flaky tests
>>> from this point in the cycle forward:
>>> 
>>>   - Default posture to label fix version as beta
>>>   - *excepting* on case-by-case basis, if flake could imply product
>> defect
>>>   that would greatly impair beta testing we leave alpha
>>> 
>> 
>> I would be in favor of tightening this further to flakes that imply
>> interface changes or major defects (e.g. corruption, data loss, etc). To do
>> so would require evaluation of the flaky test, however, which I think is in
>> sync with our "start in alpha and make exceptions to move to beta". The
>> difference would be that we better define and widen what flaky tests can be
>> punted to beta and my guess is we could already evaluate all outstanding
>> flaky test tickets by that bar.
>> 
>> Jordan
>>

Re: [DISCUSSION] Flaky tests

Reply via email to