>
> So my idea was to suggest to start tracking an exact Jenkins report maybe?

Basing our point of view on the canonical test runs on apache infra makes
sense to me, assuming that infra is behaving these days. :) Pretty sure
Mick got that in working order.

At least for me, what I learned in the past is we'd drive to a green test
board and immediately transition it as a milestone, so flaky tests would
reappear like a disappointing game of whack-a-mole. They seem frustratingly
ever-present.

I'd personally advocate for us taking the following stance on flaky tests
from this point in the cycle forward:

   - Default posture to label fix version as beta
   - *excepting* on case-by-case basis, if flake could imply product defect
   that would greatly impair beta testing we leave alpha
   - Take current flakes and go fixver beta
   - Hard, no compromise position on "we don't RC until all flakes are dead"
   - Use Jenkins as canonical source of truth for "is beta ready" cutoff

I'm personally balancing the risk of flaky tests confounding beta work
against my perceived value of being able to widely signal beta's
availability and encourage widespread user testing. I believe the value in
the latter justifies the risk of the former (I currently perceive that risk
as minimal; I could be wrong). I am also weighting the risk of "test
failures persist to or past RC" at 0. That's a hill I'll die on.


On Wed, May 27, 2020 at 5:13 PM Ekaterina Dimitrova <
ekaterina.dimitr...@datastax.com> wrote:

> Dear all,
> I spent some time these days looking into the Release Lifecycle document.
> As we keep on saying we approach Beta based on the Jira board, I was
> curious what is the exact borderline to cut it.
>
> Looking at all the latest reports (thanks to everyone who was working on
> that; I think having an overview on what's going on is always a good
> thing), I have the feeling that the thing that prevents us primarily from
> cutting beta at the moment is flaky tests. According to the lifecycle
> document:
>
>    - No flaky tests - All tests (Unit Tests and DTests) should pass
>    consistently. A failing test, upon analyzing the root cause of failure,
> may
>    be “ignored in exceptional cases”, if appropriate, for the release,
> after
>    discussion in the dev mailing list."
>
>  Now the related questions that popped up into my mind:
> - "ignored in exceptional cases" - examples?
> - No flaky tests according to Jenkins or CircleCI? Also, some people run
> the free tier, others take advantage of premium CircleCI. What should be
> the framework?
> - Furthermore, flaky tests with what frequency? (This is a tricky question,
> I know)
>
> In different conversations with colleagues from the C* community I got the
> impression that canonical suite (in this case Jenkins) might be the right
> direction to follow.
>
> To be clear, I am always checking any failures seen in any environment and
> I truly believe that they are worth it to be checked. Not advocating to
> skip anything!  But also, sometimes I feel in many cases CircleCI could
> provide input worth tracking but less likely to be product flakes. Am I
> right? In addition, different people use different CircleCI config and see
> different output. Not to mention flaky tests on Mac running with two
> cores... Yes, this is sometimes the only way to reproduce some of the
> reported tests' issues...
>
> So my idea was to suggest to start tracking an exact Jenkins report maybe?
> Anything reported out of it also to be checked but potentially to be able
> to leave it for Beta in case we don't feel it shows a product defect. One
> more thing to consider is that the big Test epic is primarily happening in
> beta.
>
> Curious to hear what the community thinks about this topic. Probably people
> also have additional thoughts based on experience from the previous
> releases. How those things worked in the past? Any lessons learned? What is
> our "plan Beta"?
>
> Ekaterina Dimitrova
> e. ekaterina.dimitr...@datastax.com
> w. www.datastax.com
>

Reply via email to