Re: [DISCUSSION] Flaky tests

Joshua McKenzie Thu, 28 May 2020 07:54:09 -0700

Agree re: 15299.

This thread is about pushing out flaky tests and what we define as that
cohort as I understand it.


On Thu, May 28, 2020 at 7:59 AM Ekaterina Dimitrova <e.dimitr...@gmail.com>
wrote:

> CASSANDRA-15299  - All interface-related still open tickets are blockers.
> My point was that they are already just a few, looking into Jira. So except
> them, flaky tests are really a thing that requires attention.
>
> Also, I agree with Mick that it’s good to have a plan and opened Jira
> tickets earlier than later.
>
> On Thu, 28 May 2020 at 5:27, Sam Tunnicliffe <s...@beobal.com> wrote:
>
> > > I have the feeling that the thing that prevents us primarily from
> > cutting beta at the moment is flaky tests
> >
> > CASSANDRA-15299 is still in progress and I think we have to consider it a
> > blocker, given that beta "should be interface-stable, so that consumers
> do
> > not have to incur any code changes on their end, as the release
> progresses
> > from Alpha through EOL."
> >
> >
> > > On 28 May 2020, at 01:23, Joshua McKenzie <jmcken...@apache.org>
> wrote:
> > >
> > >>
> > >> So my idea was to suggest to start tracking an exact Jenkins report
> > maybe?
> > >
> > > Basing our point of view on the canonical test runs on apache infra
> makes
> > > sense to me, assuming that infra is behaving these days. :) Pretty sure
> > > Mick got that in working order.
> > >
> > > At least for me, what I learned in the past is we'd drive to a green
> test
> > > board and immediately transition it as a milestone, so flaky tests
> would
> > > reappear like a disappointing game of whack-a-mole. They seem
> > frustratingly
> > > ever-present.
> > >
> > > I'd personally advocate for us taking the following stance on flaky
> tests
> > > from this point in the cycle forward:
> > >
> > >   - Default posture to label fix version as beta
> > >   - *excepting* on case-by-case basis, if flake could imply product
> > defect
> > >   that would greatly impair beta testing we leave alpha
> > >   - Take current flakes and go fixver beta
> > >   - Hard, no compromise position on "we don't RC until all flakes are
> > dead"
> > >   - Use Jenkins as canonical source of truth for "is beta ready" cutoff
> > >
> > > I'm personally balancing the risk of flaky tests confounding beta work
> > > against my perceived value of being able to widely signal beta's
> > > availability and encourage widespread user testing. I believe the value
> > in
> > > the latter justifies the risk of the former (I currently perceive that
> > risk
> > > as minimal; I could be wrong). I am also weighting the risk of "test
> > > failures persist to or past RC" at 0. That's a hill I'll die on.
> > >
> > >
> > > On Wed, May 27, 2020 at 5:13 PM Ekaterina Dimitrova <
> > > ekaterina.dimitr...@datastax.com> wrote:
> > >
> > >> Dear all,
> > >> I spent some time these days looking into the Release Lifecycle
> > document.
> > >> As we keep on saying we approach Beta based on the Jira board, I was
> > >> curious what is the exact borderline to cut it.
> > >>
> > >> Looking at all the latest reports (thanks to everyone who was working
> on
> > >> that; I think having an overview on what's going on is always a good
> > >> thing), I have the feeling that the thing that prevents us primarily
> > from
> > >> cutting beta at the moment is flaky tests. According to the lifecycle
> > >> document:
> > >>
> > >>   - No flaky tests - All tests (Unit Tests and DTests) should pass
> > >>   consistently. A failing test, upon analyzing the root cause of
> > failure,
> > >> may
> > >>   be “ignored in exceptional cases”, if appropriate, for the release,
> > >> after
> > >>   discussion in the dev mailing list."
> > >>
> > >> Now the related questions that popped up into my mind:
> > >> - "ignored in exceptional cases" - examples?
> > >> - No flaky tests according to Jenkins or CircleCI? Also, some people
> run
> > >> the free tier, others take advantage of premium CircleCI. What should
> be
> > >> the framework?
> > >> - Furthermore, flaky tests with what frequency? (This is a tricky
> > question,
> > >> I know)
> > >>
> > >> In different conversations with colleagues from the C* community I got
> > the
> > >> impression that canonical suite (in this case Jenkins) might be the
> > right
> > >> direction to follow.
> > >>
> > >> To be clear, I am always checking any failures seen in any environment
> > and
> > >> I truly believe that they are worth it to be checked. Not advocating
> to
> > >> skip anything!  But also, sometimes I feel in many cases CircleCI
> could
> > >> provide input worth tracking but less likely to be product flakes. Am
> I
> > >> right? In addition, different people use different CircleCI config and
> > see
> > >> different output. Not to mention flaky tests on Mac running with two
> > >> cores... Yes, this is sometimes the only way to reproduce some of the
> > >> reported tests' issues...
> > >>
> > >> So my idea was to suggest to start tracking an exact Jenkins report
> > maybe?
> > >> Anything reported out of it also to be checked but potentially to be
> > able
> > >> to leave it for Beta in case we don't feel it shows a product defect.
> > One
> > >> more thing to consider is that the big Test epic is primarily
> happening
> > in
> > >> beta.
> > >>
> > >> Curious to hear what the community thinks about this topic. Probably
> > people
> > >> also have additional thoughts based on experience from the previous
> > >> releases. How those things worked in the past? Any lessons learned?
> > What is
> > >> our "plan Beta"?
> > >>
> > >> Ekaterina Dimitrova
> > >> e. ekaterina.dimitr...@datastax.com
> > >> w. www.datastax.com
> > >>
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: dev-h...@cassandra.apache.org
> >
> >
>

Re: [DISCUSSION] Flaky tests

Reply via email to