Re: [PROPOSAL] Improve ActiveMQ 5 build stability

Jean-Baptiste Onofre Tue, 16 Mar 2021 21:45:45 -0700

Hi Matt,

I agree.


I think we should do a full build not after a single merge, but a "group" of 
merges. Else, it means that we will do a full build after each PR merge, so 
basically it’s what we have today, and not practical at all.

That’s why, as a first step, I’m proposing to run once a week or "on demand".

Regards
JB

> Le 16 mars 2021 à 22:15, Matt Pavlovich <[email protected]> a écrit :
> 
> Feels like we are in a transition period. I don’t see a per-PR unit test job 
> being practical until the execution times come way down— and that is going to 
> be significant engineering effort. 
> 
> That being said, full build with full tests the day after a merged change 
> seems like a reasonable schedule.
> 
>> On Mar 16, 2021, at 3:24 PM, Hossack, Etienne <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>> Hi all, just some thoughts to share here: 
>> 
>> I think ideally I would expect a “static” build to be a nightly build run 
>> every day - but I think given the frequency of contributions, weekly makes 
>> sense (and I know it takes about 30% of the day 😅).
>> Maybe too, that runs a snapshot version of the dependency checker to fail 
>> the build on CVEs or new types of checks from that tool. 
>> 
>> And then for contributions, the PR can run the lightweight profile, but then 
>> master could run the full profile on merge?
>> 
>> Does that make sense?
>> In summary I think it’s me expressing agreement for a static build, but also 
>> suggesting a full build be run on contributions in case there are multiple 
>> merges in a week, or say right after the build is run, and increasing the 
>> time-to-discovery of errors.
>> 
>> Cheers,
>> Étienne Hossack
>> Software Development Engineer, Amazon MQ
>> email: [email protected] <mailto:[email protected]> 
>> <mailto:[email protected] <mailto:[email protected]>>
>> phone: +1-778-945-8287
>> 
>> 
>> 
>>> On Mar 15, 2021, at 10:05 PM, Jean-Baptiste Onofre <[email protected] 
>>> <mailto:[email protected]> <mailto:[email protected] 
>>> <mailto:[email protected]>>> wrote:
>>> 
>>> CAUTION: This email originated from outside of the organization. Do not 
>>> click links or open attachments unless you can confirm the sender and know 
>>> the content is safe.
>>> 
>>> 
>>> 
>>> Hi guys,
>>> 
>>> I created https://github.com/apache/activemq/pull/622 
>>> <https://github.com/apache/activemq/pull/622> 
>>> <https://github.com/apache/activemq/pull/622 
>>> <https://github.com/apache/activemq/pull/622>> PR about this and you can 
>>> see that Jenkins is happy now. The full build took about 120mn (2h) on 
>>> Jenkins.
>>> 
>>> Basically what I did in the PR:
>>> - remove activemq-unit-tests and itests (Karaf, Spring3) from the default 
>>> reactor
>>> - introduce full.test profile that build all modules including unit tests 
>>> and itests
>>> 
>>> The full.test profile is not use in Jenkinsfile, meaning that the PR 
>>> executes all tests but not activemq-unit-tests modules neither itests. I 
>>> think it’s acceptable for PR (and it already takes 2 hours ;)).
>>> I would like to introduce a "static" build on ci-builds.apache.org 
>>> <http://ci-builds.apache.org/> <http://ci-builds.apache.org/ 
>>> <http://ci-builds.apache.org/>> (not via Jenkinsfile) executed every week 
>>> and doing a full build (including full.test profile).
>>> 
>>> Thoughts ?
>>> 
>>> Regards
>>> JB
>>> 
>>>> Le 15 mars 2021 à 08:20, Jean-Baptiste Onofre <[email protected] 
>>>> <mailto:[email protected]> <mailto:[email protected] 
>>>> <mailto:[email protected]>>> a écrit :
>>>> 
>>>> Hi guys,
>>>> 
>>>> I have create the following Jira with the tests I found "flaky" (in a full 
>>>> build, not necessary single execution, it can also depends of the machine, 
>>>> that’s why I tested with several docker setup in terms of CPU and memory):
>>>> 
>>>> AMQ-8190: DuplexAdvisoryRaceTest is failing (Jonathan said he gonna take a 
>>>> look)
>>>> AMQ-8189: CachedLDAPAuthorizationModuleTest is failing
>>>> AMQ-8188: AMQ5266SingleDestTest is failing
>>>> 
>>>> There’s a test failure in leveldb module, but it’s not a big deal as I 
>>>> have the PR ready to remove leveldb 
>>>> (https://github.com/apache/activemq/pull/593 
>>>> <https://github.com/apache/activemq/pull/593> 
>>>> <https://github.com/apache/activemq/pull/593 
>>>> <https://github.com/apache/activemq/pull/593>>).
>>>> 
>>>> I’m also retesting StompNIOSSLTest, it seems way more stable thanks to 
>>>> Chris
>>>> 
>>>> I also created AMQ-8191 (linked with previous Jira) about cleanup on the 
>>>> profiles, fast.test profile introduction and usage on Jenkins, and exclude 
>>>> the failing tests waiting to be fixed (and reinclude them at that time).
>>>> 
>>>> AMQ-8191 is almost ready, I’m testing.
>>>> 
>>>> Regards
>>>> JB
>>>> 
>>>>> Le 14 mars 2021 à 06:04, Jean-Baptiste Onofre <[email protected] 
>>>>> <mailto:[email protected]> <mailto:[email protected] 
>>>>> <mailto:[email protected]>>> a écrit :
>>>>> 
>>>>> Hi guys,
>>>>> 
>>>>> I’ve updated my local branch according to your comments:
>>>>> 
>>>>> 1. I’ve cleanup the profiles and introduce/rename a fast profile that 
>>>>> executes all unit tests in modules but exclude the activemq-unit-tests 
>>>>> and karaf-itests.
>>>>> 2. I’m keeping the smoke test profile
>>>>> 3. I’ve created a tobefixed profile that include all flaky tests I’ve 
>>>>> identified
>>>>> 4. I’ve updated Jenkinsfile to use fast profile on PR
>>>>> 
>>>>> I will create the PR soon.
>>>>> 
>>>>> Regards
>>>>> JB
>>>>> 
>>>>>> Le 13 mars 2021 à 06:05, Jean-Baptiste Onofre <[email protected] 
>>>>>> <mailto:[email protected]> <mailto:[email protected] 
>>>>>> <mailto:[email protected]>>> a écrit :
>>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> We already have "fast" profile, and it’s good idea to use this profile 
>>>>>> on Jenkins by default and move some tests here.
>>>>>> 
>>>>>> For instance, I don’t think it’s require to launch all 
>>>>>> activemq-unit-test by default but I would keep the tests in each module 
>>>>>> (they are fast and doesn’t need whole broker infra).
>>>>>> 
>>>>>> About RetryRule, I did that in Karaf as well, let me see if it helps for 
>>>>>> ActiveMQ.
>>>>>> 
>>>>>> Thanks !
>>>>>> I will improve this way.
>>>>>> 
>>>>>> Regards
>>>>>> JB
>>>>>> 
>>>>>>> Le 12 mars 2021 à 20:31, Clebert Suconic <[email protected] 
>>>>>>> <mailto:[email protected]> <mailto:[email protected] 
>>>>>>> <mailto:[email protected]>>> a écrit :
>>>>>>> 
>>>>>>> You should instead have a fast profile, with a subset of the testsuite
>>>>>>> to run on every commit and branch for these cases. I looked on Jenkins
>>>>>>> and having many builds taking 3 Hours each won't really scale on the
>>>>>>> lab anyway. Failures will only make things worse there.
>>>>>>> 
>>>>>>> The lab is usually not powerful for long running tests.
>>>>>>> 
>>>>>>> And a full profile that should run as part of a full run. (say.. once
>>>>>>> a day instead of every commit), or any interval you chose.
>>>>>>> 
>>>>>>> I don't think you should hide tests though.. as that is like pushing
>>>>>>> dirt under the rug.. (even if you say to enable it later... as in
>>>>>>> anything in life temporary solutions endup being definitive usually).
>>>>>>> 
>>>>>>> As any System dealing with times and asynchronous flaky and races are
>>>>>>> part of the day. One thing I did in ActiveMQ Artemis was to write a
>>>>>>> Rule where the test is retried. You could also add retries to tests in
>>>>>>> cases where it is acceptable... but be careful to not just hide bugs
>>>>>>> away in this case as well.
>>>>>>> 
>>>>>>> If you are interested, on artemis, Look for usages on
>>>>>>> https://github.com/apache/activemq-artemis/blob/master/artemis-commons/src/test/java/org/apache/activemq/artemis/utils/RetryRule.java
>>>>>>>  
>>>>>>> <https://github.com/apache/activemq-artemis/blob/master/artemis-commons/src/test/java/org/apache/activemq/artemis/utils/RetryRule.java>
>>>>>>>  
>>>>>>> <https://github.com/apache/activemq-artemis/blob/master/artemis-commons/src/test/java/org/apache/activemq/artemis/utils/RetryRule.java
>>>>>>>  
>>>>>>> <https://github.com/apache/activemq-artemis/blob/master/artemis-commons/src/test/java/org/apache/activemq/artemis/utils/RetryRule.java>>
>>>>>>> 
>>>>>>> 
>>>>>>> You need to activate a profile in artemis for the retryRule to work.
>>>>>>> 
>>>>>>> On Fri, Mar 12, 2021 at 1:56 PM JB Onofré <[email protected]> wrote:
>>>>>>>> 
>>>>>>>> Yes agree. I’m launching new builds ;)
>>>>>>>> 
>>>>>>>>> Le 12 mars 2021 à 19:51, Christopher Shannon 
>>>>>>>>> <[email protected]> a écrit :
>>>>>>>>> 
>>>>>>>>> Just running it by itself on the command line and also in the IDE. 
>>>>>>>>> The full
>>>>>>>>> build takes a while and if it's breaking with that then it's probably 
>>>>>>>>> some
>>>>>>>>> other test that isn't cleaning up properly in between runs.
>>>>>>>>> 
>>>>>>>>>> On Fri, Mar 12, 2021 at 1:47 PM JB Onofré <[email protected]> wrote:
>>>>>>>>>> 
>>>>>>>>>> Did you try in a full build or the test individually ? I’m running a 
>>>>>>>>>> new
>>>>>>>>>> build.
>>>>>>>>>> 
>>>>>>>>>>> Le 12 mars 2021 à 19:38, Christopher Shannon <
>>>>>>>>>> [email protected]> a écrit :
>>>>>>>>>>> 
>>>>>>>>>>> I've been running the DurableSyncNetworkBridgeTest several times 
>>>>>>>>>>> on my
>>>>>>>>>> box
>>>>>>>>>>> and it always passes.
>>>>>>>>>>> 
>>>>>>>>>>>> On Fri, Mar 12, 2021 at 1:25 PM Christopher Shannon <
>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> Ideally it would be better to fix tests than to simply exclude 
>>>>>>>>>>>> them.
>>>>>>>>>> These
>>>>>>>>>>>> tests were added for a reason I would presume (I know I had worked 
>>>>>>>>>>>> on
>>>>>>>>>> the
>>>>>>>>>>>> durable sync stuff in the past) so randomly turning off tests could
>>>>>>>>>> lead to
>>>>>>>>>>>> missing errors.
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> On Fri, Mar 12, 2021 at 12:57 PM Jean-Baptiste Onofre 
>>>>>>>>>>>> <[email protected]>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>> I’m adding these tests to be fixed/improved:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> FailoverDurableSubTransactionTest.testFailoverCommitListener
>>>>>>>>>>>>> DurableSyncNetworkBridgeTest.testRemoveSubscriptionPropagate
>>>>>>>>>>>>> DurableSyncNetworkBridgeTest.testRemoveSubscriptionWithBridgeOffline
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Let me create the Jira and create a PR to exclude the tests and 
>>>>>>>>>>>>> verify
>>>>>>>>>>>>> Jenkins is happy.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Regards
>>>>>>>>>>>>> JB
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Le 12 mars 2021 à 16:14, Jonathan Gallimore <
>>>>>>>>>>>>> [email protected]> a écrit :
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I'm +1 on the actions :).
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Jon
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Fri, Mar 12, 2021 at 3:11 PM Jean-Baptiste Onofre 
>>>>>>>>>>>>>> <[email protected]
>>>>>>>>>>> 
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Sure, thanks for the help !
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Just waiting for some feedback before starting the "actions" ;)
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Regards
>>>>>>>>>>>>>>> JB
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Le 12 mars 2021 à 14:29, Jonathan Gallimore <
>>>>>>>>>>>>>>> [email protected]> a écrit :
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> I ran into this test failing yesterday:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>> activemq-unit-tests/src/test/java/org/apache/activemq/usecases/DuplexAdvisoryRaceTest.java
>>>>>>>>>>>>>>>> - I'd be happy to try and contribute a fix. Would you like to 
>>>>>>>>>>>>>>>> assign
>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>> JIRA to me?
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Jon
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On Fri, Mar 12, 2021 at 12:58 PM Jean-Baptiste Onofre <
>>>>>>>>>>>>> [email protected]>
>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Hi guys,
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Now that we have Jenkinsfile in our repo, and we use Jenkins
>>>>>>>>>>>>> pipeline,
>>>>>>>>>>>>>>> we
>>>>>>>>>>>>>>>>> dramatically improved our build: the build is executed for 
>>>>>>>>>>>>>>>>> each
>>>>>>>>>>>>>>>>> PullRequests or commit on the main branch.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> However, we have lot of failing tests, causing quite 
>>>>>>>>>>>>>>>>> systematically
>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> build failing on ci-builds.apache.org.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> We really need to have a clean, accurate and stable build: it 
>>>>>>>>>>>>>>>>> will
>>>>>>>>>>>>>>> improve
>>>>>>>>>>>>>>>>> the issue detection and simplify the review, especially for
>>>>>>>>>>>>>>> PullRequests.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> I ran several builds on my machine (with different docker
>>>>>>>>>> containers)
>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>> I already identified some failing/flaky tests:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> -
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>> activemq-leveldb-store/src/test/java/org/apache/activemq/leveldb/test/ElectingLevelDBStoreTest.java
>>>>>>>>>>>>>>>>> is not a big deal as I have a PR removing leveled completely
>>>>>>>>>>>>>>>>> -
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>> activemq-stomp/src/test/java/org/apache/activemq/transport/stomp/Stomp11NIOSSLTest.java.
>>>>>>>>>>>>>>>>> Chris did an improvement, but I still have some flakiness 
>>>>>>>>>>>>>>>>> here.
>>>>>>>>>>>>>>>>> -
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>> activemq-unit-tests/src/test/java/org/apache/activemq/usecases/DuplexAdvisoryRaceTest.java
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> I propose the following action plan:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 1. Create the Jira for each failing/flaky tests
>>>>>>>>>>>>>>>>> 2. Exclude the tests (in surefire plugin configuration) to 
>>>>>>>>>>>>>>>>> have a
>>>>>>>>>>>>> "green
>>>>>>>>>>>>>>>>> light" on Jenkins.
>>>>>>>>>>>>>>>>> 3. For each Jira, we work on a PullRequest, to be sure that 
>>>>>>>>>>>>>>>>> Jenkins
>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>> still "happy".
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Anyone willing to help on (3) is welcome !
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> If there’s no objection, I will start with (1) and (2).
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>> Regards
>>>>>>>>>>>>>>>>> JB
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> Clebert Suconic

Re: [PROPOSAL] Improve ActiveMQ 5 build stability

Reply via email to