Re: [DISCUSSION] Flaky tests

Sam Tunnicliffe Thu, 28 May 2020 02:28:01 -0700

> I have the feeling that the thing that prevents us primarily from
cutting beta at the moment is flaky tests


CASSANDRA-15299 is still in progress and I think we have to consider it a 
blocker, given that beta "should be interface-stable, so that consumers do not 
have to incur any code changes on their end, as the release progresses from 
Alpha through EOL."


> On 28 May 2020, at 01:23, Joshua McKenzie <jmcken...@apache.org> wrote:
> 
>> 
>> So my idea was to suggest to start tracking an exact Jenkins report maybe?
> 
> Basing our point of view on the canonical test runs on apache infra makes
> sense to me, assuming that infra is behaving these days. :) Pretty sure
> Mick got that in working order.
> 
> At least for me, what I learned in the past is we'd drive to a green test
> board and immediately transition it as a milestone, so flaky tests would
> reappear like a disappointing game of whack-a-mole. They seem frustratingly
> ever-present.
> 
> I'd personally advocate for us taking the following stance on flaky tests
> from this point in the cycle forward:
> 
>   - Default posture to label fix version as beta
>   - *excepting* on case-by-case basis, if flake could imply product defect
>   that would greatly impair beta testing we leave alpha
>   - Take current flakes and go fixver beta
>   - Hard, no compromise position on "we don't RC until all flakes are dead"
>   - Use Jenkins as canonical source of truth for "is beta ready" cutoff
> 
> I'm personally balancing the risk of flaky tests confounding beta work
> against my perceived value of being able to widely signal beta's
> availability and encourage widespread user testing. I believe the value in
> the latter justifies the risk of the former (I currently perceive that risk
> as minimal; I could be wrong). I am also weighting the risk of "test
> failures persist to or past RC" at 0. That's a hill I'll die on.
> 
> 
> On Wed, May 27, 2020 at 5:13 PM Ekaterina Dimitrova <
> ekaterina.dimitr...@datastax.com> wrote:
> 
>> Dear all,
>> I spent some time these days looking into the Release Lifecycle document.
>> As we keep on saying we approach Beta based on the Jira board, I was
>> curious what is the exact borderline to cut it.
>> 
>> Looking at all the latest reports (thanks to everyone who was working on
>> that; I think having an overview on what's going on is always a good
>> thing), I have the feeling that the thing that prevents us primarily from
>> cutting beta at the moment is flaky tests. According to the lifecycle
>> document:
>> 
>>   - No flaky tests - All tests (Unit Tests and DTests) should pass
>>   consistently. A failing test, upon analyzing the root cause of failure,
>> may
>>   be “ignored in exceptional cases”, if appropriate, for the release,
>> after
>>   discussion in the dev mailing list."
>> 
>> Now the related questions that popped up into my mind:
>> - "ignored in exceptional cases" - examples?
>> - No flaky tests according to Jenkins or CircleCI? Also, some people run
>> the free tier, others take advantage of premium CircleCI. What should be
>> the framework?
>> - Furthermore, flaky tests with what frequency? (This is a tricky question,
>> I know)
>> 
>> In different conversations with colleagues from the C* community I got the
>> impression that canonical suite (in this case Jenkins) might be the right
>> direction to follow.
>> 
>> To be clear, I am always checking any failures seen in any environment and
>> I truly believe that they are worth it to be checked. Not advocating to
>> skip anything!  But also, sometimes I feel in many cases CircleCI could
>> provide input worth tracking but less likely to be product flakes. Am I
>> right? In addition, different people use different CircleCI config and see
>> different output. Not to mention flaky tests on Mac running with two
>> cores... Yes, this is sometimes the only way to reproduce some of the
>> reported tests' issues...
>> 
>> So my idea was to suggest to start tracking an exact Jenkins report maybe?
>> Anything reported out of it also to be checked but potentially to be able
>> to leave it for Beta in case we don't feel it shows a product defect. One
>> more thing to consider is that the big Test epic is primarily happening in
>> beta.
>> 
>> Curious to hear what the community thinks about this topic. Probably people
>> also have additional thoughts based on experience from the previous
>> releases. How those things worked in the past? Any lessons learned? What is
>> our "plan Beta"?
>> 
>> Ekaterina Dimitrova
>> e. ekaterina.dimitr...@datastax.com
>> w. www.datastax.com
>> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: [DISCUSSION] Flaky tests

Reply via email to