I believe some tools can determine which tests make sense to multiplex,
given that some exact lines of code were changed using code coverage
analysis. After the initial run, we should have data from the coverage
analysis, which would tell us which test classes are tainted - that is,
they cover the modified code fragments.

Using a similar approach, we could detect the coverage differences when
running, say w/wo compression, and discover the tests which cover those
parts of the code.

That way, we can be smart and save time by precisely pointing to it makes
sense to test more accurately.


śr., 12 lip 2023 o 14:52 Jacek Lewandowski <lewandowski.ja...@gmail.com>
napisał(a):

> Would it be re-opening the ticket or creating a new ticket with "revert of
> fix" ?
>
>
>
> śr., 12 lip 2023 o 14:51 Ekaterina Dimitrova <e.dimitr...@gmail.com>
> napisał(a):
>
>> jenkins_jira_integration
>> <https://github.com/apache/cassandra-builds/blob/trunk/jenkins-jira-integration/jenkins_jira_integration.py>
>>  script
>> updating the JIRA ticket with test results if you cause a regression + us
>> building a muscle around reverting your commit if they break tests.“
>>
>> I am not sure people finding the time to fix their breakages will be
>> solved but at least they will be pinged automatically. Hopefully many
>> follow Jira updates.
>>
>> “  I don't take the past as strongly indicative of the future here since
>> we've been allowing circle to validate pre-commit and haven't been
>> multiplexing.”
>> I am interested to compare how many tickets for flaky tests we will have
>> pre-5.0 now compared to pre-4.1.
>>
>>
>> On Wed, 12 Jul 2023 at 8:41, Josh McKenzie <jmcken...@apache.org> wrote:
>>
>>> (This response ended up being a bit longer than intended; sorry about
>>> that)
>>>
>>> What is more common though is packaging errors,
>>> cdc/compression/system_ks_directory targeted fixes, CI w/wo
>>> upgrade tests, being less responsive post-commit as you already
>>> moved on
>>>
>>> *Two that **should **be resolved in the new regime:*
>>> * Packaging errors should be caught pre as we're making the artifact
>>> builds part of pre-commit.
>>> * I'm hoping to merge the commit log segment allocation so CDC allocator
>>> is the only one for 5.0 (and just bypasses the cdc-related work on
>>> allocation if it's disabled thus not impacting perf); the existing targeted
>>> testing of cdc specific functionality should be sufficient to confirm its
>>> correctness as it doesn't vary from the primary allocation path when it
>>> comes to mutation space in the buffer
>>> * Upgrade tests are going to be part of the pre-commit suite
>>>
>>> *Outstanding issues:*
>>> * compression. If we just run with defaults we won't test all cases so
>>> errors could pop up here
>>> * system_ks_directory related things: is this still ongoing or did we
>>> have a transient burst of these types of issues? And would we expect these
>>> to vary based on different JDK's, non-default configurations, etc?
>>> * Being less responsive post-commit: My only ideas here are a
>>> combination of the jenkins_jira_integration
>>> <https://github.com/apache/cassandra-builds/blob/trunk/jenkins-jira-integration/jenkins_jira_integration.py>
>>> script updating the JIRA ticket with test results if you cause a regression
>>> + us building a muscle around reverting your commit if they break tests.
>>>
>>> To quote Jacek:
>>>
>>> why don't run dtests w/wo sstable compression x w/wo internode
>>> encryption x w/wo vnodes,
>>> w/wo off-heap buffers x j8/j11/j17 x w/wo CDC x RedHat/Debian/SUSE, etc.
>>> I think this is a matter of cost vs result.
>>>
>>>
>>> I think we've organically made these decisions and tradeoffs in the past
>>> without being methodical about it. If we can:
>>> 1. Multiplex changed or new tests
>>> 2. Tighten the feedback loop of "tests were green, now they're
>>> *consistently* not, you're the only one who changed something", and
>>> 3. Instill a culture of "if you can't fix it immediately revert your
>>> commit"
>>>
>>> Then I think we'll only be vulnerable to flaky failures introduced
>>> across different non-default configurations as side effects in tests that
>>> aren't touched, which *intuitively* feels like a lot less than we're
>>> facing today. We could even get clever as a day 2 effort and define
>>> packages in the primary codebase where changes take place and multiplex (on
>>> a smaller scale) their respective packages of unit tests in the future if
>>> we see problems in this area.
>>>
>>> Flakey tests are a giant pain in the ass and a huge drain on
>>> productivity, don't get me wrong. *And* we have to balance how much
>>> cost we're paying before each commit with the benefit we expect to gain
>>> from that.
>>>
>>> Does the above make sense? Are there things you've seen in the trenches
>>> that challenge or invalidate any of those perspectives?
>>>
>>> On Wed, Jul 12, 2023, at 7:28 AM, Jacek Lewandowski wrote:
>>>
>>> Isn't novnodes a special case of vnodes with n=1 ?
>>>
>>> We should rather select a subset of tests for which it makes sense to
>>> run with different configurations.
>>>
>>> The set of configurations against which we run the tests currently is
>>> still only the subset of all possible cases.
>>> I could ask - why don't run dtests w/wo sstable compression x w/wo
>>> internode encryption x w/wo vnodes,
>>> w/wo off-heap buffers x j8/j11/j17 x w/wo CDC x RedHat/Debian/SUSE, etc.
>>> I think this is a matter of cost vs result.
>>> This equation contains the likelihood of failure in configuration X,
>>> given there was no failure in the default
>>> configuration, the cost of running those tests, the time we delay
>>> merging, the likelihood that we wait for
>>> the test results so long that our branch diverge and we will have to
>>> rerun them or accept the fact that we merge
>>> a code which was tested on outdated base. Eventually, the overall new
>>> contributors experience - whether they
>>> want to participate in the future.
>>>
>>>
>>>
>>> śr., 12 lip 2023 o 07:24 Berenguer Blasi <berenguerbl...@gmail.com>
>>> napisał(a):
>>>
>>> On our 4.0 release I remember a number of such failures but not
>>> recently. What is more common though is packaging errors,
>>> cdc/compression/system_ks_directory targeted fixes, CI w/wo upgrade tests,
>>> being less responsive post-commit as you already moved on,... Either the
>>> smoke pre-commit has approval steps for everything or we should give imo a
>>> devBranch alike job to the dev pre-commit. I find it terribly useful. My
>>> 2cts.
>>> On 11/7/23 18:26, Josh McKenzie wrote:
>>>
>>> 2: Pre-commit 'devBranch' full suite for high risk/disruptive merges: at
>>> reviewer's discretion
>>>
>>> In general, maybe offering a dev the option of choosing either
>>> "pre-commit smoke" or "post-commit full" at their discretion for any work
>>> would be the right play.
>>>
>>> A follow-on thought: even with something as significant as Accord, TCM,
>>> Trie data structures, etc - I'd be a bit surprised to see tests fail on
>>> JDK17 that didn't on 11, or with vs. without vnodes, in ways that weren't
>>> immediately clear the patch stumbled across something surprising and was
>>> immediately trivially attributable if not fixable. *In theory* the
>>> things we're talking about excluding from the pre-commit smoke test suite
>>> are all things that are supposed to be identical across environments and
>>> thus opaque / interchangeable by default (JDK version outside checking
>>> build which we will, vnodes vs. non, etc).
>>>
>>> Has that not proven to be the case in your experience?
>>>
>>> On Tue, Jul 11, 2023, at 10:15 AM, Derek Chen-Becker wrote:
>>>
>>> A strong +1 to getting to a single CI system. CircleCI definitely has
>>> some niceties and I understand why it's currently used, but right now we
>>> get 2 CI systems for twice the price. +1 on the proposed subsets.
>>>
>>> Derek
>>>
>>> On Mon, Jul 10, 2023 at 9:37 AM Josh McKenzie <jmcken...@apache.org>
>>> wrote:
>>>
>>>
>>> I'm personally not thinking about CircleCI at all; I'm envisioning a
>>> world where all of us have 1 CI *software* system (i.e. reproducible on
>>> any env) that we use for pre-commit validation, and then post-commit
>>> happens on reference ASF hardware.
>>>
>>> So:
>>> 1: Pre-commit subset of tests (suites + matrices + env) runs. On green,
>>> merge.
>>> 2: Post-commit tests (all suites, matrices, env) runs. If failure, link
>>> back to the JIRA where the commit took place
>>>
>>> Circle would need to remain in lockstep with the requirements for point
>>> 1 here.
>>>
>>> On Mon, Jul 10, 2023, at 1:04 AM, Berenguer Blasi wrote:
>>>
>>> +1 to Josh which is exactly my line of thought as well. But that is only
>>> valid if we have a solid Jenkins that will eventually run all test configs.
>>> So I think I lost track a bit here. Are you proposing:
>>>
>>> 1- CircleCI: Run pre-commit a single (the most common/meaningful, TBD)
>>> config of tests
>>>
>>> 2- Jenkins: Runs post-commit _all_ test configs and emails/notifies you
>>> in case of problems?
>>>
>>> Or sthg different like having 1 also in Jenkins?
>>> On 7/7/23 17:55, Andrés de la Peña wrote:
>>>
>>> I think 500 runs combining all configs could be reasonable, since it's
>>> unlikely to have config-specific flaky tests. As in five configs with 100
>>> repetitions each.
>>>
>>> On Fri, 7 Jul 2023 at 16:14, Josh McKenzie <jmcken...@apache.org> wrote:
>>>
>>> Maybe. Kind of depends on how long we write our tests to run doesn't it?
>>> :)
>>>
>>> But point taken. Any non-trivial test would start to be something of a
>>> beast under this approach.
>>>
>>> On Fri, Jul 7, 2023, at 11:12 AM, Brandon Williams wrote:
>>>
>>> On Fri, Jul 7, 2023 at 10:09 AM Josh McKenzie <jmcken...@apache.org>
>>> wrote:
>>> > 3. Multiplexed tests (changed, added) run against all JDK's and a
>>> broader range of configs (no-vnode, vnode default, compression, etc)
>>>
>>> I think this is going to be too heavy...we're taking 500 iterations
>>> and multiplying that by like 4 or 5?
>>>
>>>
>>>
>>>
>>>
>>> --
>>> +---------------------------------------------------------------+
>>> | Derek Chen-Becker                                             |
>>> | GPG Key available at https://keybase.io/dchenbecker and       |
>>> | https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org |
>>> | Fngrprnt: EB8A 6480 F0A3 C8EB C1E7  7F42 AFC5 AFEE 96E4 6ACC  |
>>> +---------------------------------------------------------------+
>>>
>>>
>>>
>>>

Reply via email to