[jira] [Commented] (CASSANDRA-12277) Extend testing infrastructure to handle expected intermittent flaky tests - see ReplicationAwareTokenAllocatorTest.testNewCluster

Sylvain Lebresne (JIRA) Tue, 26 Jul 2016 01:08:35 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-12277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15393397#comment-15393397
 ]


Sylvain Lebresne commented on CASSANDRA-12277:
----------------------------------------------

bq. If we, collectively, don't have the discipline not to abuse something like 
this, we have bigger problems.

We're all sometimes lazy, and it's not specific to this project, so I think not 
making it too easy to do the wrong thing (here, blindly ignoring a rare flaky 
test as long as it doesn't fail too often) is just smart project management. 
Also, test is the one area where I think we've been historically pretty bad at 
discipline, so while I'm all for improving our ways, I'm going to object on 
assuming discipline until I consider we've collectively and consistently shown 
discipline for a reasonable length of time.

bq. If we leave a precise comment in the code, we still have a non-green 
test-board and the cognitive burden of filtering out "known flaky" failures 
when checking test results.

You misunderstood what I meant. I'm not saying we let the test flake, I'm 
saying we manually modify the test to "run this thing N times, if M pass we're 
good" (and clearly explain why it's ok to do so), but without bothering adding 
infrastructure for it.

But let me also be clear I'm not suggesting we'd manually modify tests that way 
on any regular basis, that would be stupid.

Basically, I feel we're conflating 2 things here. It seems to me 
{{ReplicationAwareTokenAllocatorTest.testNewCluster}} is (potentially) a rare 
case where we *understand* why the test is flaky, but where 1) fixing the 
flakiness is not worth it and 2) we decide that we understand the flakiness 
well enough that we can trust the test to provide value _even_ if we ignore a 
few failed runs.

But I can't really see that being anything else than a very very rare situation 
(I'm not even entirely saying I'm fine with that one). So I don't think we 
should base our infrastructure for handling flaky tests on those assumptions. 
In general, a flaky test is a bug (probably of the test but still), and we 
should identify the reason of the flakiness and fix it. I'm fine marking tests 
with {{@flaky}} temporarily (when our educated guess is that it's probably a 
test problem), while we find time to fix it properly so it doesn't clutter the 
test result board. But I'm not convinced we should *replace* that by a 
different annotation that considers a flaky test is fine as long as it doesn't 
fail too often, which is what I understand from the description of this ticket.


> Extend testing infrastructure to handle expected intermittent flaky tests - 
> see ReplicationAwareTokenAllocatorTest.testNewCluster
> ---------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-12277
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12277
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Joshua McKenzie
>            Assignee: Branimir Lambov
>            Priority: Minor
>              Labels: test
>
> From an offline discussion:
> bq. The ReplicationAwareTokenAllocatorTest.testNewCluster failure is a flake 
> -- randomness will sometimes (on the order of 1/100) cause it to fail. 
> Extending the ranges to avoid these flakes goes too far and makes the test 
> meaningless.
> bq. How about instead of @flaky/@Ignore which currently indicates a test that 
> intermittently fails but we do not expect it to, we instead use @tries, or 
> @runs, or some annotation that indicates "run this thing N times, if M pass 
> we're good". This would allow us to keep the current "we don't care about 
> these test results (in context of green test board) because intermittent 
> failures are not expected and the test quality needs shoring up" from "we 
> expect this test to fail sometimes in this particular way."



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-12277) Extend testing infrastructure to handle expected intermittent flaky tests - see ReplicationAwareTokenAllocatorTest.testNewCluster

Reply via email to