On Wed, Nov 3, 2021 at 12:35 PM bened...@apache.org <bened...@apache.org> wrote: > > The largest number of test failures turn out (as pointed out by David) to be > due to how arcane it was to trigger the full test suite. Hopefully we can get > on top of that, but I think a significant remaining issue is a lack of trust > in the output of CI. It’s hard to gate commit on a clean CI run when there’s > flaky tests, and it doesn’t take much to misattribute one failing test to the > existing flakiness (I tend to compare to a run of the trunk baseline for > comparison, but this is burdensome and still error prone). The more flaky > tests there are the more likely this is. > > This is in my opinion the real cost of flaky tests, and it’s probably worth > trying to crack down on them hard if we can. It’s possible the Simulator may > help here, when I finally finish it up, as we can port flaky tests to run > with the Simulator and the failing seed can then be explored > deterministically (all being well).
I totally agree that the lack of trust is a driving problem here, even in knowing which CI system to rely on. When Jenkins broke but Circle was fine, we all assumed it was a problem with Jenkins, right up until Circle also broke. In testing a distributed system like this I think we're always going to have failures, even on non-flaky tests, simply because the underlying infrastructure is variable with transient failures of its own (the network is reliable!) We can fix the flakies where the fault is in the code (and we've done this to many already) but to get more trustworthy output, I think we're going to need a system that understands the difference between success, failure, and timeouts, and in the latter case knows how to at least mark them differently. Simulator may help, as do the in-jvm dtests, but there is ultimately no way to cover everything without doing some things the hard, more realistic way where sometimes shit happens, marring the almost-perfect runs with noisy doubt, which then has to be sifted through to determine if there was a real issue. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org