Re: How Apache Cassandra handles flaky tests

Stanislav Kozlovski Fri, 01 Mar 2019 07:05:14 -0800

Those are all helpful tips and all make complete sense to me. Thank you very 
much for sharing your experience! :)


On 2019/02/27 00:55:34, "d...@yahoo.com.INVALID" <d...@yahoo.com.INVALID> 
wrote: 
> +1 to everything Jeff said. As someone who has worked on flaky tests not just 
> in Cassandra's context, I know it can be hard to deal with them. > 
> However, it's best to root cause them. I have found some flaky tests were 
> genuine issues that needed fixing in Cassandra. Sometimes the flakiness is 
> due to underpowered VMs running low on resources or in one case tests failed 
> due to the kernel settings different between systems. Explore tuning the VM 
> settings used for the test execution. I usually don't prefer adding retries 
> but in some cases retries can be helpful. Rewriting the tests to reduce 
> dependencies on external systems or using mocks is another useful method in 
> reducing the flakiness. Try breaking up tests if they're too big. Finally 
> deleting tests can also be a solution but use it sparingly. > 
> I am believe in the broken windows theory so it is critical that you spend 
> time fixing them else everyone ignores them and attributes all failures to 
> "flakiness" leading to real issues sneaking in.> 
> Dinesh> 
> 
>     On Tuesday, February 26, 2019, 12:06:10 PM PST, Jeff Jirsa 
> <jj...@gmail.com> wrote:  > 
>  > 
>  > 
> 
> 
> > On Feb 26, 2019, at 8:26 AM, Stanislav Kozlovski <st...@outlook.com> 
> > wrote:> 
> > > 
> > Hey there Cassandra community,> 
> > > 
> > I work on a fellow open-source project - Apache Kafka - and there we have 
> > been fighting flaky tests a lot. We run Java 8 and Java 11 builds on every 
> > Pull Request and due to test flakiness, almost all of them turn out red 
> > with 1 or 2 tests (completely unrelated to the change in the PR) failing. 
> > This has resulted in committers ignoring them and merging the changes 
> > either way, or in the worst case - rerunning the hour-long build until it 
> > becomes green.> 
> 
> I hope most committers wont commit unless the flakey rest is definitely not 
> in the subsystem they touched. But yes, one of the motivations for speeding 
> up tests (parallelized on a containerized hosted CI platform) was to cut down 
> the time for (re-)running> 
>  > 
> > This test flakiness has also slowed down our releases significantly.> 
> > > 
> > In general, I was just curious to understand if this is a problem that 
> > Cassandra faces as well.> 
> 
> Yes> 
> 
> 
> > Does your project have a lot of intermittently failing tests,> 
> 
> Sometimes more than others. There were a few big pushes to get green, though 
> it naturally regresses a bit over time > 
> 
> > do you have any active process of addressing such tests (during the initial 
> > review, after realizing it is flaky, etc). Any pointers will be greatly 
> > appreciated!> 
> 
> I don’t think we’ve solved this convincingly. Different large (corporate) 
> contributors have done long one time passes, and that helped a ton, but I 
> don’t think there are any silver bullets yet.> 
> ---------------------------------------------------------------------> 
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org> 
> For additional commands, e-mail: dev-h...@cassandra.apache.org> 
>   >

Re: How Apache Cassandra handles flaky tests

Reply via email to