Re: [DISCUSS] Releasable trunk and quality

Brandon Williams Wed, 03 Nov 2021 10:56:27 -0700

On Wed, Nov 3, 2021 at 12:35 PM bened...@apache.org <bened...@apache.org> wrote:
>
> The largest number of test failures turn out (as pointed out by David) to be 
> due to how arcane it was to trigger the full test suite. Hopefully we can get 
> on top of that, but I think a significant remaining issue is a lack of trust 
> in the output of CI. It’s hard to gate commit on a clean CI run when there’s 
> flaky tests, and it doesn’t take much to misattribute one failing test to the 
> existing flakiness (I tend to compare to a run of the trunk baseline for 
> comparison, but this is burdensome and still error prone). The more flaky 
> tests there are the more likely this is.
>
> This is in my opinion the real cost of flaky tests, and it’s probably worth 
> trying to crack down on them hard if we can. It’s possible the Simulator may 
> help here, when I finally finish it up, as we can port flaky tests to run 
> with the Simulator and the failing seed can then be explored 
> deterministically (all being well).


I totally agree that the lack of trust is a driving problem here, even
in knowing which CI system to rely on. When Jenkins broke but Circle
was fine, we all assumed it was a problem with Jenkins, right up until
Circle also broke.

In testing a distributed system like this I think we're always going
to have failures, even on non-flaky tests, simply because the
underlying infrastructure is variable with transient failures of its
own (the network is reliable!)  We can fix the flakies where the fault
is in the code (and we've done this to many already) but to get more
trustworthy output, I think we're going to need a system that
understands the difference between success, failure, and timeouts, and
in the latter case knows how to at least mark them differently.
Simulator may help, as do the in-jvm dtests, but there is ultimately
no way to cover everything without doing some things the hard, more
realistic way where sometimes shit happens, marring the almost-perfect
runs with noisy doubt, which then has to be sifted through to
determine if there was a real issue.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: [DISCUSS] Releasable trunk and quality

Reply via email to