[as described 
here](https://maven.apache.org/surefire/maven-surefire-plugin/examples/rerun-failing-tests.html),
 lets see if this can help out at all.

My gut tells me that this is not a good idea to do in practice, but false 
failure rate due to known flaky tests seems pretty bad lately. It could be 
potentially scoped to `druid-api`, `druid-common`, `druid-processing`, 
`druid-server`, `druid-indexing-service`, `druid-kafka-indexing-service` to 
cover all currently open issues with the flaky-test label. 

I'd much prefer if we could just set an annotation on the actual tests that we 
know to be lame and only allow this functionality on those, but I haven't 
turned up anything that does that yet. Idealistically we fix them all, but on 
the other hand some of these have been failing on occasion for years now, that 
I know myself and many other have spent a fair chunk of time trying to resolve 
to no avail. Many of these tests are covering rather complicated behaviors and 
interactions, so are generally worth putting up with flaky behavior for the 
value they provide, so disabling them isn't an option either.

I would say on the upside, this option will hopefully result in less time spent 
re-triggering the same (mostly `curator-test` based) tests over and over until 
they pass. An obvious downside is lower chance of once and for all resolving 
flaky tests since catching new ones requires checking the test results manually 
and that it's easier to ignore existing issues if they don't explode on you 
often.

Additionally, this might not help at all, since retries might push us over the 
timeout limit. I suggest we re-run the tests on this PR many times to check if 
it's effective at all. [Also, this option is only for junit 
4.x](https://github.com/junit-team/junit5/issues/1558), just to add to the pile 
of things to consider.

[ Full content available at: 
https://github.com/apache/incubator-druid/pull/6324 ]
This message was relayed via gitbox.apache.org for [email protected]

Reply via email to