Hi all, just some thoughts to share here: I think ideally I would expect a “static” build to be a nightly build run every day - but I think given the frequency of contributions, weekly makes sense (and I know it takes about 30% of the day 😅). Maybe too, that runs a snapshot version of the dependency checker to fail the build on CVEs or new types of checks from that tool.
And then for contributions, the PR can run the lightweight profile, but then master could run the full profile on merge? Does that make sense? In summary I think it’s me expressing agreement for a static build, but also suggesting a full build be run on contributions in case there are multiple merges in a week, or say right after the build is run, and increasing the time-to-discovery of errors. Cheers, Étienne Hossack Software Development Engineer, Amazon MQ email: [email protected]<mailto:[email protected]> phone: +1-778-945-8287 [cid:[email protected]] On Mar 15, 2021, at 10:05 PM, Jean-Baptiste Onofre <[email protected]<mailto:[email protected]>> wrote: CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. Hi guys, I created https://github.com/apache/activemq/pull/622 PR about this and you can see that Jenkins is happy now. The full build took about 120mn (2h) on Jenkins. Basically what I did in the PR: - remove activemq-unit-tests and itests (Karaf, Spring3) from the default reactor - introduce full.test profile that build all modules including unit tests and itests The full.test profile is not use in Jenkinsfile, meaning that the PR executes all tests but not activemq-unit-tests modules neither itests. I think it’s acceptable for PR (and it already takes 2 hours ;)). I would like to introduce a "static" build on ci-builds.apache.org<http://ci-builds.apache.org> (not via Jenkinsfile) executed every week and doing a full build (including full.test profile). Thoughts ? Regards JB Le 15 mars 2021 à 08:20, Jean-Baptiste Onofre <[email protected]<mailto:[email protected]>> a écrit : Hi guys, I have create the following Jira with the tests I found "flaky" (in a full build, not necessary single execution, it can also depends of the machine, that’s why I tested with several docker setup in terms of CPU and memory): AMQ-8190: DuplexAdvisoryRaceTest is failing (Jonathan said he gonna take a look) AMQ-8189: CachedLDAPAuthorizationModuleTest is failing AMQ-8188: AMQ5266SingleDestTest is failing There’s a test failure in leveldb module, but it’s not a big deal as I have the PR ready to remove leveldb (https://github.com/apache/activemq/pull/593). I’m also retesting StompNIOSSLTest, it seems way more stable thanks to Chris I also created AMQ-8191 (linked with previous Jira) about cleanup on the profiles, fast.test profile introduction and usage on Jenkins, and exclude the failing tests waiting to be fixed (and reinclude them at that time). AMQ-8191 is almost ready, I’m testing. Regards JB Le 14 mars 2021 à 06:04, Jean-Baptiste Onofre <[email protected]<mailto:[email protected]>> a écrit : Hi guys, I’ve updated my local branch according to your comments: 1. I’ve cleanup the profiles and introduce/rename a fast profile that executes all unit tests in modules but exclude the activemq-unit-tests and karaf-itests. 2. I’m keeping the smoke test profile 3. I’ve created a tobefixed profile that include all flaky tests I’ve identified 4. I’ve updated Jenkinsfile to use fast profile on PR I will create the PR soon. Regards JB Le 13 mars 2021 à 06:05, Jean-Baptiste Onofre <[email protected]<mailto:[email protected]>> a écrit : Hi, We already have "fast" profile, and it’s good idea to use this profile on Jenkins by default and move some tests here. For instance, I don’t think it’s require to launch all activemq-unit-test by default but I would keep the tests in each module (they are fast and doesn’t need whole broker infra). About RetryRule, I did that in Karaf as well, let me see if it helps for ActiveMQ. Thanks ! I will improve this way. Regards JB Le 12 mars 2021 à 20:31, Clebert Suconic <[email protected]<mailto:[email protected]>> a écrit : You should instead have a fast profile, with a subset of the testsuite to run on every commit and branch for these cases. I looked on Jenkins and having many builds taking 3 Hours each won't really scale on the lab anyway. Failures will only make things worse there. The lab is usually not powerful for long running tests. And a full profile that should run as part of a full run. (say.. once a day instead of every commit), or any interval you chose. I don't think you should hide tests though.. as that is like pushing dirt under the rug.. (even if you say to enable it later... as in anything in life temporary solutions endup being definitive usually). As any System dealing with times and asynchronous flaky and races are part of the day. One thing I did in ActiveMQ Artemis was to write a Rule where the test is retried. You could also add retries to tests in cases where it is acceptable... but be careful to not just hide bugs away in this case as well. If you are interested, on artemis, Look for usages on https://github.com/apache/activemq-artemis/blob/master/artemis-commons/src/test/java/org/apache/activemq/artemis/utils/RetryRule.java You need to activate a profile in artemis for the retryRule to work. On Fri, Mar 12, 2021 at 1:56 PM JB Onofré <[email protected]> wrote: Yes agree. I’m launching new builds ;) Le 12 mars 2021 à 19:51, Christopher Shannon <[email protected]> a écrit : Just running it by itself on the command line and also in the IDE. The full build takes a while and if it's breaking with that then it's probably some other test that isn't cleaning up properly in between runs. On Fri, Mar 12, 2021 at 1:47 PM JB Onofré <[email protected]> wrote: Did you try in a full build or the test individually ? I’m running a new build. Le 12 mars 2021 à 19:38, Christopher Shannon < [email protected]> a écrit : I've been running the DurableSyncNetworkBridgeTest several times on my box and it always passes. On Fri, Mar 12, 2021 at 1:25 PM Christopher Shannon < [email protected]> wrote: Ideally it would be better to fix tests than to simply exclude them. These tests were added for a reason I would presume (I know I had worked on the durable sync stuff in the past) so randomly turning off tests could lead to missing errors. On Fri, Mar 12, 2021 at 12:57 PM Jean-Baptiste Onofre <[email protected]> wrote: I’m adding these tests to be fixed/improved: FailoverDurableSubTransactionTest.testFailoverCommitListener DurableSyncNetworkBridgeTest.testRemoveSubscriptionPropagate DurableSyncNetworkBridgeTest.testRemoveSubscriptionWithBridgeOffline Let me create the Jira and create a PR to exclude the tests and verify Jenkins is happy. Regards JB Le 12 mars 2021 à 16:14, Jonathan Gallimore < [email protected]> a écrit : I'm +1 on the actions :). Jon On Fri, Mar 12, 2021 at 3:11 PM Jean-Baptiste Onofre <[email protected] wrote: Sure, thanks for the help ! Just waiting for some feedback before starting the "actions" ;) Regards JB Le 12 mars 2021 à 14:29, Jonathan Gallimore < [email protected]> a écrit : I ran into this test failing yesterday: activemq-unit-tests/src/test/java/org/apache/activemq/usecases/DuplexAdvisoryRaceTest.java - I'd be happy to try and contribute a fix. Would you like to assign the JIRA to me? Jon On Fri, Mar 12, 2021 at 12:58 PM Jean-Baptiste Onofre < [email protected]> wrote: Hi guys, Now that we have Jenkinsfile in our repo, and we use Jenkins pipeline, we dramatically improved our build: the build is executed for each PullRequests or commit on the main branch. However, we have lot of failing tests, causing quite systematically the build failing on ci-builds.apache.org. We really need to have a clean, accurate and stable build: it will improve the issue detection and simplify the review, especially for PullRequests. I ran several builds on my machine (with different docker containers) and I already identified some failing/flaky tests: - activemq-leveldb-store/src/test/java/org/apache/activemq/leveldb/test/ElectingLevelDBStoreTest.java is not a big deal as I have a PR removing leveled completely - activemq-stomp/src/test/java/org/apache/activemq/transport/stomp/Stomp11NIOSSLTest.java. Chris did an improvement, but I still have some flakiness here. - activemq-unit-tests/src/test/java/org/apache/activemq/usecases/DuplexAdvisoryRaceTest.java I propose the following action plan: 1. Create the Jira for each failing/flaky tests 2. Exclude the tests (in surefire plugin configuration) to have a "green light" on Jenkins. 3. For each Jira, we work on a PullRequest, to be sure that Jenkins is still "happy". Anyone willing to help on (3) is welcome ! If there’s no objection, I will start with (1) and (2). Thanks, Regards JB -- Clebert Suconic
