[ 
https://issues.apache.org/jira/browse/CASSANDRA-12277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15393484#comment-15393484
 ] 

Sylvain Lebresne commented on CASSANDRA-12277:
----------------------------------------------

I'm mostly disagreeing than most of our flaky test fails for known and well 
understood reason (which was I understood the description to suggest, though I 
might have misinterpreted it). I think by and large, our flaky tests are flaky 
for reason we don't fully understood. They are tests for which our educated 
guess is that that it's likely an environment issue, but it's a guess 
nonetheless. The category of tests that flake for very well understood and 
controlled reasons may exist, but is imo comparatively tiny (and thus I just 
don't want to design the handling of flaky test based on assuming that 2nd 
category).

In general though, we're in agreement that 1) flakiness happens and will 
continue to happen and 2) we currently don't deal with it very well.

I'm completely on board changing how {{@flaky}} works, so that on a failure it 
automatically retries some N times and "pass" as long as it didn't failed more 
than M times (N and M to be defined). It's certainly much better than having 
{{@flaky}} skip tests altogether or fail the whole build, no questions about 
that.

But we should still, I think, consider those {{@flaky}} test for what they are: 
bugs that hopefully should be fixed over time. If we want another annotation to 
distinguish a few tests whose flakiness is well understood and accepted (and 
thus don't need any fixing), fine, but that's comparatively unimportant imo.

> Extend testing infrastructure to handle expected intermittent flaky tests - 
> see ReplicationAwareTokenAllocatorTest.testNewCluster
> ---------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-12277
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12277
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Joshua McKenzie
>            Assignee: Branimir Lambov
>            Priority: Minor
>              Labels: test
>
> From an offline discussion:
> bq. The ReplicationAwareTokenAllocatorTest.testNewCluster failure is a flake 
> -- randomness will sometimes (on the order of 1/100) cause it to fail. 
> Extending the ranges to avoid these flakes goes too far and makes the test 
> meaningless.
> bq. How about instead of @flaky/@Ignore which currently indicates a test that 
> intermittently fails but we do not expect it to, we instead use @tries, or 
> @runs, or some annotation that indicates "run this thing N times, if M pass 
> we're good". This would allow us to keep the current "we don't care about 
> these test results (in context of green test board) because intermittent 
> failures are not expected and the test quality needs shoring up" from "we 
> expect this test to fail sometimes in this particular way."



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to