[
https://issues.apache.org/jira/browse/CASSANDRA-12277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15393484#comment-15393484
]
Sylvain Lebresne commented on CASSANDRA-12277:
----------------------------------------------
I'm mostly disagreeing than most of our flaky test fails for known and well
understood reason (which was I understood the description to suggest, though I
might have misinterpreted it). I think by and large, our flaky tests are flaky
for reason we don't fully understood. They are tests for which our educated
guess is that that it's likely an environment issue, but it's a guess
nonetheless. The category of tests that flake for very well understood and
controlled reasons may exist, but is imo comparatively tiny (and thus I just
don't want to design the handling of flaky test based on assuming that 2nd
category).
In general though, we're in agreement that 1) flakiness happens and will
continue to happen and 2) we currently don't deal with it very well.
I'm completely on board changing how {{@flaky}} works, so that on a failure it
automatically retries some N times and "pass" as long as it didn't failed more
than M times (N and M to be defined). It's certainly much better than having
{{@flaky}} skip tests altogether or fail the whole build, no questions about
that.
But we should still, I think, consider those {{@flaky}} test for what they are:
bugs that hopefully should be fixed over time. If we want another annotation to
distinguish a few tests whose flakiness is well understood and accepted (and
thus don't need any fixing), fine, but that's comparatively unimportant imo.
> Extend testing infrastructure to handle expected intermittent flaky tests -
> see ReplicationAwareTokenAllocatorTest.testNewCluster
> ---------------------------------------------------------------------------------------------------------------------------------
>
> Key: CASSANDRA-12277
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12277
> Project: Cassandra
> Issue Type: Bug
> Reporter: Joshua McKenzie
> Assignee: Branimir Lambov
> Priority: Minor
> Labels: test
>
> From an offline discussion:
> bq. The ReplicationAwareTokenAllocatorTest.testNewCluster failure is a flake
> -- randomness will sometimes (on the order of 1/100) cause it to fail.
> Extending the ranges to avoid these flakes goes too far and makes the test
> meaningless.
> bq. How about instead of @flaky/@Ignore which currently indicates a test that
> intermittently fails but we do not expect it to, we instead use @tries, or
> @runs, or some annotation that indicates "run this thing N times, if M pass
> we're good". This would allow us to keep the current "we don't care about
> these test results (in context of green test board) because intermittent
> failures are not expected and the test quality needs shoring up" from "we
> expect this test to fail sometimes in this particular way."
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)