> Unit tests that fail consistently but only on one configuration, should not > be removed/replaced until the replacement also catches the failure.
> along the way, people have decided a certain configuration deserves > additional testing and it has been done this way in lieu of any other more > efficient approach. Totally agree with these sentiments as well as the framing of our current unit tests as "bad fuzz-tests thanks to non-determinism". To me, this reinforces my stance on a "pre-commit vs. post-commit" approach to testing *with our current constraints:* • Test the default configuration on all supported JDK's pre-commit • Post-commit, treat *consistent *failures on non-default configurations as immediate interrupts to the author that introduced them • Pre-release, push for no consistent failures on any suite in any configuration, and no regressions in flaky tests from prior release (in ASF CI env). I think there's value in having the non-default configurations, but I'm not convinced the benefits outweigh the costs *specifically in terms of pre-commit work* due to flakiness in the execution of the software env itself, not to mention hardware env variance on the ASF side today. All that said - if we got to a world where we could run our jvm-based tests deterministically within the simulator, my intuition is that we'd see a lot of the test-specific, non-defect flakiness reduced drastically. In such a world I'd be in favor of running :allthethings: pre-commit as we'd have *much* higher confidence that those failures were actually attributable to the author of whatever diff the test is run against. On Fri, Dec 8, 2023, at 8:25 AM, Mick Semb Wever wrote: > > > >> >>> I think everyone agrees here, but…. these variations are still catching >>> failures, and until we have an improvement or replacement we do rely on >>> them. I'm not in favour of removing them until we have proof /confidence >>> that any replacement is catching the same failures. Especially oa, tries, >>> vnodes. (Not tries and offheap is being replaced with "latest", which will >>> be valuable simplification.) >> >> What kind of proof do you expect? I cannot imagine how we could prove that >> because the ability of detecting failures results from the randomness of >> those tests. That's why when such a test fail you usually cannot reproduce >> that easily. > > > Unit tests that fail consistently but only on one configuration, should not > be removed/replaced until the replacement also catches the failure. > >> We could extrapolate that to - why we only have those configurations? why >> don't test trie / oa + compression, or CDC, or system memtable? > > > Because, along the way, people have decided a certain configuration deserves > additional testing and it has been done this way in lieu of any other more > efficient approach. > > >