Hi Matt, I agree.
I think we should do a full build not after a single merge, but a "group" of merges. Else, it means that we will do a full build after each PR merge, so basically it’s what we have today, and not practical at all. That’s why, as a first step, I’m proposing to run once a week or "on demand". Regards JB > Le 16 mars 2021 à 22:15, Matt Pavlovich <[email protected]> a écrit : > > Feels like we are in a transition period. I don’t see a per-PR unit test job > being practical until the execution times come way down— and that is going to > be significant engineering effort. > > That being said, full build with full tests the day after a merged change > seems like a reasonable schedule. > >> On Mar 16, 2021, at 3:24 PM, Hossack, Etienne <[email protected] >> <mailto:[email protected]>> wrote: >> >> Hi all, just some thoughts to share here: >> >> I think ideally I would expect a “static” build to be a nightly build run >> every day - but I think given the frequency of contributions, weekly makes >> sense (and I know it takes about 30% of the day 😅). >> Maybe too, that runs a snapshot version of the dependency checker to fail >> the build on CVEs or new types of checks from that tool. >> >> And then for contributions, the PR can run the lightweight profile, but then >> master could run the full profile on merge? >> >> Does that make sense? >> In summary I think it’s me expressing agreement for a static build, but also >> suggesting a full build be run on contributions in case there are multiple >> merges in a week, or say right after the build is run, and increasing the >> time-to-discovery of errors. >> >> Cheers, >> Étienne Hossack >> Software Development Engineer, Amazon MQ >> email: [email protected] <mailto:[email protected]> >> <mailto:[email protected] <mailto:[email protected]>> >> phone: +1-778-945-8287 >> >> >> >>> On Mar 15, 2021, at 10:05 PM, Jean-Baptiste Onofre <[email protected] >>> <mailto:[email protected]> <mailto:[email protected] >>> <mailto:[email protected]>>> wrote: >>> >>> CAUTION: This email originated from outside of the organization. Do not >>> click links or open attachments unless you can confirm the sender and know >>> the content is safe. >>> >>> >>> >>> Hi guys, >>> >>> I created https://github.com/apache/activemq/pull/622 >>> <https://github.com/apache/activemq/pull/622> >>> <https://github.com/apache/activemq/pull/622 >>> <https://github.com/apache/activemq/pull/622>> PR about this and you can >>> see that Jenkins is happy now. The full build took about 120mn (2h) on >>> Jenkins. >>> >>> Basically what I did in the PR: >>> - remove activemq-unit-tests and itests (Karaf, Spring3) from the default >>> reactor >>> - introduce full.test profile that build all modules including unit tests >>> and itests >>> >>> The full.test profile is not use in Jenkinsfile, meaning that the PR >>> executes all tests but not activemq-unit-tests modules neither itests. I >>> think it’s acceptable for PR (and it already takes 2 hours ;)). >>> I would like to introduce a "static" build on ci-builds.apache.org >>> <http://ci-builds.apache.org/> <http://ci-builds.apache.org/ >>> <http://ci-builds.apache.org/>> (not via Jenkinsfile) executed every week >>> and doing a full build (including full.test profile). >>> >>> Thoughts ? >>> >>> Regards >>> JB >>> >>>> Le 15 mars 2021 à 08:20, Jean-Baptiste Onofre <[email protected] >>>> <mailto:[email protected]> <mailto:[email protected] >>>> <mailto:[email protected]>>> a écrit : >>>> >>>> Hi guys, >>>> >>>> I have create the following Jira with the tests I found "flaky" (in a full >>>> build, not necessary single execution, it can also depends of the machine, >>>> that’s why I tested with several docker setup in terms of CPU and memory): >>>> >>>> AMQ-8190: DuplexAdvisoryRaceTest is failing (Jonathan said he gonna take a >>>> look) >>>> AMQ-8189: CachedLDAPAuthorizationModuleTest is failing >>>> AMQ-8188: AMQ5266SingleDestTest is failing >>>> >>>> There’s a test failure in leveldb module, but it’s not a big deal as I >>>> have the PR ready to remove leveldb >>>> (https://github.com/apache/activemq/pull/593 >>>> <https://github.com/apache/activemq/pull/593> >>>> <https://github.com/apache/activemq/pull/593 >>>> <https://github.com/apache/activemq/pull/593>>). >>>> >>>> I’m also retesting StompNIOSSLTest, it seems way more stable thanks to >>>> Chris >>>> >>>> I also created AMQ-8191 (linked with previous Jira) about cleanup on the >>>> profiles, fast.test profile introduction and usage on Jenkins, and exclude >>>> the failing tests waiting to be fixed (and reinclude them at that time). >>>> >>>> AMQ-8191 is almost ready, I’m testing. >>>> >>>> Regards >>>> JB >>>> >>>>> Le 14 mars 2021 à 06:04, Jean-Baptiste Onofre <[email protected] >>>>> <mailto:[email protected]> <mailto:[email protected] >>>>> <mailto:[email protected]>>> a écrit : >>>>> >>>>> Hi guys, >>>>> >>>>> I’ve updated my local branch according to your comments: >>>>> >>>>> 1. I’ve cleanup the profiles and introduce/rename a fast profile that >>>>> executes all unit tests in modules but exclude the activemq-unit-tests >>>>> and karaf-itests. >>>>> 2. I’m keeping the smoke test profile >>>>> 3. I’ve created a tobefixed profile that include all flaky tests I’ve >>>>> identified >>>>> 4. I’ve updated Jenkinsfile to use fast profile on PR >>>>> >>>>> I will create the PR soon. >>>>> >>>>> Regards >>>>> JB >>>>> >>>>>> Le 13 mars 2021 à 06:05, Jean-Baptiste Onofre <[email protected] >>>>>> <mailto:[email protected]> <mailto:[email protected] >>>>>> <mailto:[email protected]>>> a écrit : >>>>>> >>>>>> Hi, >>>>>> >>>>>> We already have "fast" profile, and it’s good idea to use this profile >>>>>> on Jenkins by default and move some tests here. >>>>>> >>>>>> For instance, I don’t think it’s require to launch all >>>>>> activemq-unit-test by default but I would keep the tests in each module >>>>>> (they are fast and doesn’t need whole broker infra). >>>>>> >>>>>> About RetryRule, I did that in Karaf as well, let me see if it helps for >>>>>> ActiveMQ. >>>>>> >>>>>> Thanks ! >>>>>> I will improve this way. >>>>>> >>>>>> Regards >>>>>> JB >>>>>> >>>>>>> Le 12 mars 2021 à 20:31, Clebert Suconic <[email protected] >>>>>>> <mailto:[email protected]> <mailto:[email protected] >>>>>>> <mailto:[email protected]>>> a écrit : >>>>>>> >>>>>>> You should instead have a fast profile, with a subset of the testsuite >>>>>>> to run on every commit and branch for these cases. I looked on Jenkins >>>>>>> and having many builds taking 3 Hours each won't really scale on the >>>>>>> lab anyway. Failures will only make things worse there. >>>>>>> >>>>>>> The lab is usually not powerful for long running tests. >>>>>>> >>>>>>> And a full profile that should run as part of a full run. (say.. once >>>>>>> a day instead of every commit), or any interval you chose. >>>>>>> >>>>>>> I don't think you should hide tests though.. as that is like pushing >>>>>>> dirt under the rug.. (even if you say to enable it later... as in >>>>>>> anything in life temporary solutions endup being definitive usually). >>>>>>> >>>>>>> As any System dealing with times and asynchronous flaky and races are >>>>>>> part of the day. One thing I did in ActiveMQ Artemis was to write a >>>>>>> Rule where the test is retried. You could also add retries to tests in >>>>>>> cases where it is acceptable... but be careful to not just hide bugs >>>>>>> away in this case as well. >>>>>>> >>>>>>> If you are interested, on artemis, Look for usages on >>>>>>> https://github.com/apache/activemq-artemis/blob/master/artemis-commons/src/test/java/org/apache/activemq/artemis/utils/RetryRule.java >>>>>>> >>>>>>> <https://github.com/apache/activemq-artemis/blob/master/artemis-commons/src/test/java/org/apache/activemq/artemis/utils/RetryRule.java> >>>>>>> >>>>>>> <https://github.com/apache/activemq-artemis/blob/master/artemis-commons/src/test/java/org/apache/activemq/artemis/utils/RetryRule.java >>>>>>> >>>>>>> <https://github.com/apache/activemq-artemis/blob/master/artemis-commons/src/test/java/org/apache/activemq/artemis/utils/RetryRule.java>> >>>>>>> >>>>>>> >>>>>>> You need to activate a profile in artemis for the retryRule to work. >>>>>>> >>>>>>> On Fri, Mar 12, 2021 at 1:56 PM JB Onofré <[email protected]> wrote: >>>>>>>> >>>>>>>> Yes agree. I’m launching new builds ;) >>>>>>>> >>>>>>>>> Le 12 mars 2021 à 19:51, Christopher Shannon >>>>>>>>> <[email protected]> a écrit : >>>>>>>>> >>>>>>>>> Just running it by itself on the command line and also in the IDE. >>>>>>>>> The full >>>>>>>>> build takes a while and if it's breaking with that then it's probably >>>>>>>>> some >>>>>>>>> other test that isn't cleaning up properly in between runs. >>>>>>>>> >>>>>>>>>> On Fri, Mar 12, 2021 at 1:47 PM JB Onofré <[email protected]> wrote: >>>>>>>>>> >>>>>>>>>> Did you try in a full build or the test individually ? I’m running a >>>>>>>>>> new >>>>>>>>>> build. >>>>>>>>>> >>>>>>>>>>> Le 12 mars 2021 à 19:38, Christopher Shannon < >>>>>>>>>> [email protected]> a écrit : >>>>>>>>>>> >>>>>>>>>>> I've been running the DurableSyncNetworkBridgeTest several times >>>>>>>>>>> on my >>>>>>>>>> box >>>>>>>>>>> and it always passes. >>>>>>>>>>> >>>>>>>>>>>> On Fri, Mar 12, 2021 at 1:25 PM Christopher Shannon < >>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>> >>>>>>>>>>>> Ideally it would be better to fix tests than to simply exclude >>>>>>>>>>>> them. >>>>>>>>>> These >>>>>>>>>>>> tests were added for a reason I would presume (I know I had worked >>>>>>>>>>>> on >>>>>>>>>> the >>>>>>>>>>>> durable sync stuff in the past) so randomly turning off tests could >>>>>>>>>> lead to >>>>>>>>>>>> missing errors. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Fri, Mar 12, 2021 at 12:57 PM Jean-Baptiste Onofre >>>>>>>>>>>> <[email protected]> >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> I’m adding these tests to be fixed/improved: >>>>>>>>>>>>> >>>>>>>>>>>>> FailoverDurableSubTransactionTest.testFailoverCommitListener >>>>>>>>>>>>> DurableSyncNetworkBridgeTest.testRemoveSubscriptionPropagate >>>>>>>>>>>>> DurableSyncNetworkBridgeTest.testRemoveSubscriptionWithBridgeOffline >>>>>>>>>>>>> >>>>>>>>>>>>> Let me create the Jira and create a PR to exclude the tests and >>>>>>>>>>>>> verify >>>>>>>>>>>>> Jenkins is happy. >>>>>>>>>>>>> >>>>>>>>>>>>> Regards >>>>>>>>>>>>> JB >>>>>>>>>>>>> >>>>>>>>>>>>>> Le 12 mars 2021 à 16:14, Jonathan Gallimore < >>>>>>>>>>>>> [email protected]> a écrit : >>>>>>>>>>>>>> >>>>>>>>>>>>>> I'm +1 on the actions :). >>>>>>>>>>>>>> >>>>>>>>>>>>>> Jon >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Fri, Mar 12, 2021 at 3:11 PM Jean-Baptiste Onofre >>>>>>>>>>>>>> <[email protected] >>>>>>>>>>> >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Sure, thanks for the help ! >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Just waiting for some feedback before starting the "actions" ;) >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Regards >>>>>>>>>>>>>>> JB >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Le 12 mars 2021 à 14:29, Jonathan Gallimore < >>>>>>>>>>>>>>> [email protected]> a écrit : >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I ran into this test failing yesterday: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>> activemq-unit-tests/src/test/java/org/apache/activemq/usecases/DuplexAdvisoryRaceTest.java >>>>>>>>>>>>>>>> - I'd be happy to try and contribute a fix. Would you like to >>>>>>>>>>>>>>>> assign >>>>>>>>>>>>> the >>>>>>>>>>>>>>>> JIRA to me? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Jon >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Fri, Mar 12, 2021 at 12:58 PM Jean-Baptiste Onofre < >>>>>>>>>>>>> [email protected]> >>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hi guys, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Now that we have Jenkinsfile in our repo, and we use Jenkins >>>>>>>>>>>>> pipeline, >>>>>>>>>>>>>>> we >>>>>>>>>>>>>>>>> dramatically improved our build: the build is executed for >>>>>>>>>>>>>>>>> each >>>>>>>>>>>>>>>>> PullRequests or commit on the main branch. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> However, we have lot of failing tests, causing quite >>>>>>>>>>>>>>>>> systematically >>>>>>>>>>>>> the >>>>>>>>>>>>>>>>> build failing on ci-builds.apache.org. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> We really need to have a clean, accurate and stable build: it >>>>>>>>>>>>>>>>> will >>>>>>>>>>>>>>> improve >>>>>>>>>>>>>>>>> the issue detection and simplify the review, especially for >>>>>>>>>>>>>>> PullRequests. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I ran several builds on my machine (with different docker >>>>>>>>>> containers) >>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>> I already identified some failing/flaky tests: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> - >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>> activemq-leveldb-store/src/test/java/org/apache/activemq/leveldb/test/ElectingLevelDBStoreTest.java >>>>>>>>>>>>>>>>> is not a big deal as I have a PR removing leveled completely >>>>>>>>>>>>>>>>> - >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>> activemq-stomp/src/test/java/org/apache/activemq/transport/stomp/Stomp11NIOSSLTest.java. >>>>>>>>>>>>>>>>> Chris did an improvement, but I still have some flakiness >>>>>>>>>>>>>>>>> here. >>>>>>>>>>>>>>>>> - >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>> activemq-unit-tests/src/test/java/org/apache/activemq/usecases/DuplexAdvisoryRaceTest.java >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I propose the following action plan: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> 1. Create the Jira for each failing/flaky tests >>>>>>>>>>>>>>>>> 2. Exclude the tests (in surefire plugin configuration) to >>>>>>>>>>>>>>>>> have a >>>>>>>>>>>>> "green >>>>>>>>>>>>>>>>> light" on Jenkins. >>>>>>>>>>>>>>>>> 3. For each Jira, we work on a PullRequest, to be sure that >>>>>>>>>>>>>>>>> Jenkins >>>>>>>>>>>>> is >>>>>>>>>>>>>>>>> still "happy". >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Anyone willing to help on (3) is welcome ! >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> If there’s no objection, I will start with (1) and (2). >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>> Regards >>>>>>>>>>>>>>>>> JB >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Clebert Suconic
