Re: [DISCUSSION] Flaky tests

Ekaterina Dimitrova Thu, 28 May 2020 04:59:25 -0700

CASSANDRA-15299  - All interface-related still open tickets are blockers.
My point was that they are already just a few, looking into Jira. So except
them, flaky tests are really a thing that requires attention.


Also, I agree with Mick that it’s good to have a plan and opened Jira
tickets earlier than later.

On Thu, 28 May 2020 at 5:27, Sam Tunnicliffe <[email protected]> wrote:

> > I have the feeling that the thing that prevents us primarily from
> cutting beta at the moment is flaky tests
>
> CASSANDRA-15299 is still in progress and I think we have to consider it a
> blocker, given that beta "should be interface-stable, so that consumers do
> not have to incur any code changes on their end, as the release progresses
> from Alpha through EOL."
>
>
> > On 28 May 2020, at 01:23, Joshua McKenzie <[email protected]> wrote:
> >
> >>
> >> So my idea was to suggest to start tracking an exact Jenkins report
> maybe?
> >
> > Basing our point of view on the canonical test runs on apache infra makes
> > sense to me, assuming that infra is behaving these days. :) Pretty sure
> > Mick got that in working order.
> >
> > At least for me, what I learned in the past is we'd drive to a green test
> > board and immediately transition it as a milestone, so flaky tests would
> > reappear like a disappointing game of whack-a-mole. They seem
> frustratingly
> > ever-present.
> >
> > I'd personally advocate for us taking the following stance on flaky tests
> > from this point in the cycle forward:
> >
> >   - Default posture to label fix version as beta
> >   - *excepting* on case-by-case basis, if flake could imply product
> defect
> >   that would greatly impair beta testing we leave alpha
> >   - Take current flakes and go fixver beta
> >   - Hard, no compromise position on "we don't RC until all flakes are
> dead"
> >   - Use Jenkins as canonical source of truth for "is beta ready" cutoff
> >
> > I'm personally balancing the risk of flaky tests confounding beta work
> > against my perceived value of being able to widely signal beta's
> > availability and encourage widespread user testing. I believe the value
> in
> > the latter justifies the risk of the former (I currently perceive that
> risk
> > as minimal; I could be wrong). I am also weighting the risk of "test
> > failures persist to or past RC" at 0. That's a hill I'll die on.
> >
> >
> > On Wed, May 27, 2020 at 5:13 PM Ekaterina Dimitrova <
> > [email protected]> wrote:
> >
> >> Dear all,
> >> I spent some time these days looking into the Release Lifecycle
> document.
> >> As we keep on saying we approach Beta based on the Jira board, I was
> >> curious what is the exact borderline to cut it.
> >>
> >> Looking at all the latest reports (thanks to everyone who was working on
> >> that; I think having an overview on what's going on is always a good
> >> thing), I have the feeling that the thing that prevents us primarily
> from
> >> cutting beta at the moment is flaky tests. According to the lifecycle
> >> document:
> >>
> >>   - No flaky tests - All tests (Unit Tests and DTests) should pass
> >>   consistently. A failing test, upon analyzing the root cause of
> failure,
> >> may
> >>   be “ignored in exceptional cases”, if appropriate, for the release,
> >> after
> >>   discussion in the dev mailing list."
> >>
> >> Now the related questions that popped up into my mind:
> >> - "ignored in exceptional cases" - examples?
> >> - No flaky tests according to Jenkins or CircleCI? Also, some people run
> >> the free tier, others take advantage of premium CircleCI. What should be
> >> the framework?
> >> - Furthermore, flaky tests with what frequency? (This is a tricky
> question,
> >> I know)
> >>
> >> In different conversations with colleagues from the C* community I got
> the
> >> impression that canonical suite (in this case Jenkins) might be the
> right
> >> direction to follow.
> >>
> >> To be clear, I am always checking any failures seen in any environment
> and
> >> I truly believe that they are worth it to be checked. Not advocating to
> >> skip anything!  But also, sometimes I feel in many cases CircleCI could
> >> provide input worth tracking but less likely to be product flakes. Am I
> >> right? In addition, different people use different CircleCI config and
> see
> >> different output. Not to mention flaky tests on Mac running with two
> >> cores... Yes, this is sometimes the only way to reproduce some of the
> >> reported tests' issues...
> >>
> >> So my idea was to suggest to start tracking an exact Jenkins report
> maybe?
> >> Anything reported out of it also to be checked but potentially to be
> able
> >> to leave it for Beta in case we don't feel it shows a product defect.
> One
> >> more thing to consider is that the big Test epic is primarily happening
> in
> >> beta.
> >>
> >> Curious to hear what the community thinks about this topic. Probably
> people
> >> also have additional thoughts based on experience from the previous
> >> releases. How those things worked in the past? Any lessons learned?
> What is
> >> our "plan Beta"?
> >>
> >> Ekaterina Dimitrova
> >> e. [email protected]
> >> w. www.datastax.com
> >>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Re: [DISCUSSION] Flaky tests

Reply via email to