[
https://issues.apache.org/jira/browse/CASSANDRA-12277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15393397#comment-15393397
]
Sylvain Lebresne commented on CASSANDRA-12277:
----------------------------------------------
bq. If we, collectively, don't have the discipline not to abuse something like
this, we have bigger problems.
We're all sometimes lazy, and it's not specific to this project, so I think not
making it too easy to do the wrong thing (here, blindly ignoring a rare flaky
test as long as it doesn't fail too often) is just smart project management.
Also, test is the one area where I think we've been historically pretty bad at
discipline, so while I'm all for improving our ways, I'm going to object on
assuming discipline until I consider we've collectively and consistently shown
discipline for a reasonable length of time.
bq. If we leave a precise comment in the code, we still have a non-green
test-board and the cognitive burden of filtering out "known flaky" failures
when checking test results.
You misunderstood what I meant. I'm not saying we let the test flake, I'm
saying we manually modify the test to "run this thing N times, if M pass we're
good" (and clearly explain why it's ok to do so), but without bothering adding
infrastructure for it.
But let me also be clear I'm not suggesting we'd manually modify tests that way
on any regular basis, that would be stupid.
Basically, I feel we're conflating 2 things here. It seems to me
{{ReplicationAwareTokenAllocatorTest.testNewCluster}} is (potentially) a rare
case where we *understand* why the test is flaky, but where 1) fixing the
flakiness is not worth it and 2) we decide that we understand the flakiness
well enough that we can trust the test to provide value _even_ if we ignore a
few failed runs.
But I can't really see that being anything else than a very very rare situation
(I'm not even entirely saying I'm fine with that one). So I don't think we
should base our infrastructure for handling flaky tests on those assumptions.
In general, a flaky test is a bug (probably of the test but still), and we
should identify the reason of the flakiness and fix it. I'm fine marking tests
with {{@flaky}} temporarily (when our educated guess is that it's probably a
test problem), while we find time to fix it properly so it doesn't clutter the
test result board. But I'm not convinced we should *replace* that by a
different annotation that considers a flaky test is fine as long as it doesn't
fail too often, which is what I understand from the description of this ticket.
> Extend testing infrastructure to handle expected intermittent flaky tests -
> see ReplicationAwareTokenAllocatorTest.testNewCluster
> ---------------------------------------------------------------------------------------------------------------------------------
>
> Key: CASSANDRA-12277
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12277
> Project: Cassandra
> Issue Type: Bug
> Reporter: Joshua McKenzie
> Assignee: Branimir Lambov
> Priority: Minor
> Labels: test
>
> From an offline discussion:
> bq. The ReplicationAwareTokenAllocatorTest.testNewCluster failure is a flake
> -- randomness will sometimes (on the order of 1/100) cause it to fail.
> Extending the ranges to avoid these flakes goes too far and makes the test
> meaningless.
> bq. How about instead of @flaky/@Ignore which currently indicates a test that
> intermittently fails but we do not expect it to, we instead use @tries, or
> @runs, or some annotation that indicates "run this thing N times, if M pass
> we're good". This would allow us to keep the current "we don't care about
> these test results (in context of green test board) because intermittent
> failures are not expected and the test quality needs shoring up" from "we
> expect this test to fail sometimes in this particular way."
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)