(This response ended up being a bit longer than intended; sorry about that)

> What is more common though is packaging errors,
> cdc/compression/system_ks_directory targeted fixes, CI w/wo
> upgrade tests, being less responsive post-commit as you already
> moved on
*Two that ***should ***be resolved in the new regime:**
*
* Packaging errors should be caught pre as we're making the artifact builds 
part of pre-commit.
* I'm hoping to merge the commit log segment allocation so CDC allocator is the 
only one for 5.0 (and just bypasses the cdc-related work on allocation if it's 
disabled thus not impacting perf); the existing targeted testing of cdc 
specific functionality should be sufficient to confirm its correctness as it 
doesn't vary from the primary allocation path when it comes to mutation space 
in the buffer
* Upgrade tests are going to be part of the pre-commit suite

*Outstanding issues:**
*
* compression. If we just run with defaults we won't test all cases so errors 
could pop up here
* system_ks_directory related things: is this still ongoing or did we have a 
transient burst of these types of issues? And would we expect these to vary 
based on different JDK's, non-default configurations, etc?
* Being less responsive post-commit: My only ideas here are a combination of 
the jenkins_jira_integration 
<https://github.com/apache/cassandra-builds/blob/trunk/jenkins-jira-integration/jenkins_jira_integration.py>
 script updating the JIRA ticket with test results if you cause a regression + 
us building a muscle around reverting your commit if they break tests.

To quote Jacek:
> why don't run dtests w/wo sstable compression x w/wo internode encryption x 
> w/wo vnodes, 
> w/wo off-heap buffers x j8/j11/j17 x w/wo CDC x RedHat/Debian/SUSE, etc. I 
> think this is a matter of cost vs result. 

I think we've organically made these decisions and tradeoffs in the past 
without being methodical about it. If we can:
1. Multiplex changed or new tests
2. Tighten the feedback loop of "tests were green, now they're *consistently* 
not, you're the only one who changed something", and
3. Instill a culture of "if you can't fix it immediately revert your commit"

Then I think we'll only be vulnerable to flaky failures introduced across 
different non-default configurations as side effects in tests that aren't 
touched, which *intuitively* feels like a lot less than we're facing today. We 
could even get clever as a day 2 effort and define packages in the primary 
codebase where changes take place and multiplex (on a smaller scale) their 
respective packages of unit tests in the future if we see problems in this area.

Flakey tests are a giant pain in the ass and a huge drain on productivity, 
don't get me wrong. *And* we have to balance how much cost we're paying before 
each commit with the benefit we expect to gain from that. I don't take the past 
as strongly indicative of the future here since we've been allowing circle to 
validate pre-commit and haven't been multiplexing.

Does the above make sense? Are there things you've seen in the trenches that 
challenge or invalidate any of those perspectives?

On Wed, Jul 12, 2023, at 7:28 AM, Jacek Lewandowski wrote:
> Isn't novnodes a special case of vnodes with n=1 ?
> 
> We should rather select a subset of tests for which it makes sense to run 
> with different configurations. 
> 
> The set of configurations against which we run the tests currently is still 
> only the subset of all possible cases. 
> I could ask - why don't run dtests w/wo sstable compression x w/wo internode 
> encryption x w/wo vnodes, 
> w/wo off-heap buffers x j8/j11/j17 x w/wo CDC x RedHat/Debian/SUSE, etc. I 
> think this is a matter of cost vs result. 
> This equation contains the likelihood of failure in configuration X, given 
> there was no failure in the default 
> configuration, the cost of running those tests, the time we delay merging, 
> the likelihood that we wait for 
> the test results so long that our branch diverge and we will have to rerun 
> them or accept the fact that we merge 
> a code which was tested on outdated base. Eventually, the overall new 
> contributors experience - whether they 
> want to participate in the future.
> 
> 
> 
> śr., 12 lip 2023 o 07:24 Berenguer Blasi <berenguerbl...@gmail.com> 
> napisał(a):
>> On our 4.0 release I remember a number of such failures but not recently. 
>> What is more common though is packaging errors, 
>> cdc/compression/system_ks_directory targeted fixes, CI w/wo upgrade tests, 
>> being less responsive post-commit as you already moved on,... Either the 
>> smoke pre-commit has approval steps for everything or we should give imo a 
>> devBranch alike job to the dev pre-commit. I find it terribly useful. My 
>> 2cts.
>> 
>> On 11/7/23 18:26, Josh McKenzie wrote:
>>>> 2: Pre-commit 'devBranch' full suite for high risk/disruptive merges: at 
>>>> reviewer's discretion
>>> In general, maybe offering a dev the option of choosing either "pre-commit 
>>> smoke" or "post-commit full" at their discretion for any work would be the 
>>> right play.
>>> 
>>> A follow-on thought: even with something as significant as Accord, TCM, 
>>> Trie data structures, etc - I'd be a bit surprised to see tests fail on 
>>> JDK17 that didn't on 11, or with vs. without vnodes, in ways that weren't 
>>> immediately clear the patch stumbled across something surprising and was 
>>> immediately trivially attributable if not fixable. *In theory* the things 
>>> we're talking about excluding from the pre-commit smoke test suite are all 
>>> things that are supposed to be identical across environments and thus 
>>> opaque / interchangeable by default (JDK version outside checking build 
>>> which we will, vnodes vs. non, etc).
>>> 
>>> Has that not proven to be the case in your experience?
>>> 
>>> On Tue, Jul 11, 2023, at 10:15 AM, Derek Chen-Becker wrote:
>>>> A strong +1 to getting to a single CI system. CircleCI definitely has some 
>>>> niceties and I understand why it's currently used, but right now we get 2 
>>>> CI systems for twice the price. +1 on the proposed subsets.
>>>> 
>>>> Derek
>>>> 
>>>> On Mon, Jul 10, 2023 at 9:37 AM Josh McKenzie <jmcken...@apache.org> wrote:
>>>>> 
>>>>> I'm personally not thinking about CircleCI at all; I'm envisioning a 
>>>>> world where all of us have 1 CI *software* system (i.e. reproducible on 
>>>>> any env) that we use for pre-commit validation, and then post-commit 
>>>>> happens on reference ASF hardware.
>>>>> 
>>>>> So:
>>>>> 1: Pre-commit subset of tests (suites + matrices + env) runs. On green, 
>>>>> merge.
>>>>> 2: Post-commit tests (all suites, matrices, env) runs. If failure, link 
>>>>> back to the JIRA where the commit took place
>>>>> 
>>>>> Circle would need to remain in lockstep with the requirements for point 1 
>>>>> here.
>>>>> 
>>>>> On Mon, Jul 10, 2023, at 1:04 AM, Berenguer Blasi wrote:
>>>>>> +1 to Josh which is exactly my line of thought as well. But that is only 
>>>>>> valid if we have a solid Jenkins that will eventually run all test 
>>>>>> configs. So I think I lost track a bit here. Are you proposing:
>>>>>> 
>>>>>> 1- CircleCI: Run pre-commit a single (the most common/meaningful, TBD) 
>>>>>> config of tests
>>>>>> 
>>>>>> 2- Jenkins: Runs post-commit _all_ test configs and emails/notifies you 
>>>>>> in case of problems?
>>>>>> 
>>>>>> Or sthg different like having 1 also in Jenkins?
>>>>>> 
>>>>>> On 7/7/23 17:55, Andrés de la Peña wrote:
>>>>>>> I think 500 runs combining all configs could be reasonable, since it's 
>>>>>>> unlikely to have config-specific flaky tests. As in five configs with 
>>>>>>> 100 repetitions each.
>>>>>>> 
>>>>>>> On Fri, 7 Jul 2023 at 16:14, Josh McKenzie <jmcken...@apache.org> wrote:
>>>>>>>> Maybe. Kind of depends on how long we write our tests to run doesn't 
>>>>>>>> it? :)
>>>>>>>> 
>>>>>>>> But point taken. Any non-trivial test would start to be something of a 
>>>>>>>> beast under this approach.
>>>>>>>> 
>>>>>>>> On Fri, Jul 7, 2023, at 11:12 AM, Brandon Williams wrote:
>>>>>>>>> On Fri, Jul 7, 2023 at 10:09 AM Josh McKenzie <jmcken...@apache.org> 
>>>>>>>>> wrote:
>>>>>>>>> > 3. Multiplexed tests (changed, added) run against all JDK's and a 
>>>>>>>>> > broader range of configs (no-vnode, vnode default, compression, etc)
>>>>>>>>> 
>>>>>>>>> I think this is going to be too heavy...we're taking 500 iterations
>>>>>>>>> and multiplying that by like 4 or 5?
>>>>>>>>> 
>>>>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> --
>>>> +---------------------------------------------------------------+
>>>> | Derek Chen-Becker                                             |
>>>> | GPG Key available at https://keybase.io/dchenbecker and       |
>>>> | https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org |
>>>> | Fngrprnt: EB8A 6480 F0A3 C8EB C1E7  7F42 AFC5 AFEE 96E4 6ACC  |
>>>> +---------------------------------------------------------------+
>>>> 
>>> 

Reply via email to