Re: [PROPOSAL] Improve ActiveMQ 5 build stability

Jean-Baptiste Onofre Tue, 16 Mar 2021 21:41:33 -0700

Hi Etienne,

Thanks for your feedback.


Yes, regarding how long is the build, even on PR, we are on a "challenging" 
path (2 hours to build each PR is long, and that’s why it should be stable).

A full build takes between 3 or 4 hours, so, definitely not possible on PR. 
That’s why I proposed a run once a week (it’s not nightly build, it’s a weekly 
build ;)), generating SNAPSHOTs.
We can always run this weekly build on demand (via ci-builds.apache.org 
<http://ci-builds.apache.org/>).

I can include on this "weekly build" the versions checker generation (I have a 
profile a local branch doing that).

Regards
JB

> Le 16 mars 2021 à 21:24, Hossack, Etienne <[email protected]> a 
> écrit :
> 
> Hi all, just some thoughts to share here: 
> 
> I think ideally I would expect a “static” build to be a nightly build run 
> every day - but I think given the frequency of contributions, weekly makes 
> sense (and I know it takes about 30% of the day 😅).
> Maybe too, that runs a snapshot version of the dependency checker to fail the 
> build on CVEs or new types of checks from that tool. 
> 
> And then for contributions, the PR can run the lightweight profile, but then 
> master could run the full profile on merge?
> 
> Does that make sense?
> In summary I think it’s me expressing agreement for a static build, but also 
> suggesting a full build be run on contributions in case there are multiple 
> merges in a week, or say right after the build is run, and increasing the 
> time-to-discovery of errors.
> 
> Cheers,
> Étienne Hossack
> Software Development Engineer, Amazon MQ
> email: [email protected] <mailto:[email protected]>
> phone: +1-778-945-8287
> 
> 
> 
>> On Mar 15, 2021, at 10:05 PM, Jean-Baptiste Onofre <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>> CAUTION: This email originated from outside of the organization. Do not 
>> click links or open attachments unless you can confirm the sender and know 
>> the content is safe.
>> 
>> 
>> 
>> Hi guys,
>> 
>> I created https://github.com/apache/activemq/pull/622 
>> <https://github.com/apache/activemq/pull/622> PR about this and you can see 
>> that Jenkins is happy now. The full build took about 120mn (2h) on Jenkins.
>> 
>> Basically what I did in the PR:
>> - remove activemq-unit-tests and itests (Karaf, Spring3) from the default 
>> reactor
>> - introduce full.test profile that build all modules including unit tests 
>> and itests
>> 
>> The full.test profile is not use in Jenkinsfile, meaning that the PR 
>> executes all tests but not activemq-unit-tests modules neither itests. I 
>> think it’s acceptable for PR (and it already takes 2 hours ;)).
>> I would like to introduce a "static" build on ci-builds.apache.org 
>> <http://ci-builds.apache.org/> (not via Jenkinsfile) executed every week and 
>> doing a full build (including full.test profile).
>> 
>> Thoughts ?
>> 
>> Regards
>> JB
>> 
>>> Le 15 mars 2021 à 08:20, Jean-Baptiste Onofre <[email protected] 
>>> <mailto:[email protected]>> a écrit :
>>> 
>>> Hi guys,
>>> 
>>> I have create the following Jira with the tests I found "flaky" (in a full 
>>> build, not necessary single execution, it can also depends of the machine, 
>>> that’s why I tested with several docker setup in terms of CPU and memory):
>>> 
>>> AMQ-8190: DuplexAdvisoryRaceTest is failing (Jonathan said he gonna take a 
>>> look)
>>> AMQ-8189: CachedLDAPAuthorizationModuleTest is failing
>>> AMQ-8188: AMQ5266SingleDestTest is failing
>>> 
>>> There’s a test failure in leveldb module, but it’s not a big deal as I have 
>>> the PR ready to remove leveldb (https://github.com/apache/activemq/pull/593 
>>> <https://github.com/apache/activemq/pull/593>).
>>> 
>>> I’m also retesting StompNIOSSLTest, it seems way more stable thanks to Chris
>>> 
>>> I also created AMQ-8191 (linked with previous Jira) about cleanup on the 
>>> profiles, fast.test profile introduction and usage on Jenkins, and exclude 
>>> the failing tests waiting to be fixed (and reinclude them at that time).
>>> 
>>> AMQ-8191 is almost ready, I’m testing.
>>> 
>>> Regards
>>> JB
>>> 
>>>> Le 14 mars 2021 à 06:04, Jean-Baptiste Onofre <[email protected] 
>>>> <mailto:[email protected]>> a écrit :
>>>> 
>>>> Hi guys,
>>>> 
>>>> I’ve updated my local branch according to your comments:
>>>> 
>>>> 1. I’ve cleanup the profiles and introduce/rename a fast profile that 
>>>> executes all unit tests in modules but exclude the activemq-unit-tests and 
>>>> karaf-itests.
>>>> 2. I’m keeping the smoke test profile
>>>> 3. I’ve created a tobefixed profile that include all flaky tests I’ve 
>>>> identified
>>>> 4. I’ve updated Jenkinsfile to use fast profile on PR
>>>> 
>>>> I will create the PR soon.
>>>> 
>>>> Regards
>>>> JB
>>>> 
>>>>> Le 13 mars 2021 à 06:05, Jean-Baptiste Onofre <[email protected] 
>>>>> <mailto:[email protected]>> a écrit :
>>>>> 
>>>>> Hi,
>>>>> 
>>>>> We already have "fast" profile, and it’s good idea to use this profile on 
>>>>> Jenkins by default and move some tests here.
>>>>> 
>>>>> For instance, I don’t think it’s require to launch all activemq-unit-test 
>>>>> by default but I would keep the tests in each module (they are fast and 
>>>>> doesn’t need whole broker infra).
>>>>> 
>>>>> About RetryRule, I did that in Karaf as well, let me see if it helps for 
>>>>> ActiveMQ.
>>>>> 
>>>>> Thanks !
>>>>> I will improve this way.
>>>>> 
>>>>> Regards
>>>>> JB
>>>>> 
>>>>>> Le 12 mars 2021 à 20:31, Clebert Suconic <[email protected] 
>>>>>> <mailto:[email protected]>> a écrit :
>>>>>> 
>>>>>> You should instead have a fast profile, with a subset of the testsuite
>>>>>> to run on every commit and branch for these cases. I looked on Jenkins
>>>>>> and having many builds taking 3 Hours each won't really scale on the
>>>>>> lab anyway. Failures will only make things worse there.
>>>>>> 
>>>>>> The lab is usually not powerful for long running tests.
>>>>>> 
>>>>>> And a full profile that should run as part of a full run. (say.. once
>>>>>> a day instead of every commit), or any interval you chose.
>>>>>> 
>>>>>> I don't think you should hide tests though.. as that is like pushing
>>>>>> dirt under the rug.. (even if you say to enable it later... as in
>>>>>> anything in life temporary solutions endup being definitive usually).
>>>>>> 
>>>>>> As any System dealing with times and asynchronous flaky and races are
>>>>>> part of the day. One thing I did in ActiveMQ Artemis was to write a
>>>>>> Rule where the test is retried. You could also add retries to tests in
>>>>>> cases where it is acceptable... but be careful to not just hide bugs
>>>>>> away in this case as well.
>>>>>> 
>>>>>> If you are interested, on artemis, Look for usages on
>>>>>> https://github.com/apache/activemq-artemis/blob/master/artemis-commons/src/test/java/org/apache/activemq/artemis/utils/RetryRule.java
>>>>>>  
>>>>>> <https://github.com/apache/activemq-artemis/blob/master/artemis-commons/src/test/java/org/apache/activemq/artemis/utils/RetryRule.java>
>>>>>> 
>>>>>> 
>>>>>> You need to activate a profile in artemis for the retryRule to work.
>>>>>> 
>>>>>> On Fri, Mar 12, 2021 at 1:56 PM JB Onofré <[email protected]> wrote:
>>>>>>> 
>>>>>>> Yes agree. I’m launching new builds ;)
>>>>>>> 
>>>>>>>> Le 12 mars 2021 à 19:51, Christopher Shannon 
>>>>>>>> <[email protected]> a écrit :
>>>>>>>> 
>>>>>>>> Just running it by itself on the command line and also in the IDE. 
>>>>>>>> The full
>>>>>>>> build takes a while and if it's breaking with that then it's probably 
>>>>>>>> some
>>>>>>>> other test that isn't cleaning up properly in between runs.
>>>>>>>> 
>>>>>>>>> On Fri, Mar 12, 2021 at 1:47 PM JB Onofré <[email protected]> wrote:
>>>>>>>>> 
>>>>>>>>> Did you try in a full build or the test individually ? I’m running a 
>>>>>>>>> new
>>>>>>>>> build.
>>>>>>>>> 
>>>>>>>>>> Le 12 mars 2021 à 19:38, Christopher Shannon <
>>>>>>>>> [email protected]> a écrit :
>>>>>>>>>> 
>>>>>>>>>> I've been running the DurableSyncNetworkBridgeTest several times on 
>>>>>>>>>> my
>>>>>>>>> box
>>>>>>>>>> and it always passes.
>>>>>>>>>> 
>>>>>>>>>>> On Fri, Mar 12, 2021 at 1:25 PM Christopher Shannon <
>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> Ideally it would be better to fix tests than to simply exclude them.
>>>>>>>>> These
>>>>>>>>>>> tests were added for a reason I would presume (I know I had worked 
>>>>>>>>>>> on
>>>>>>>>> the
>>>>>>>>>>> durable sync stuff in the past) so randomly turning off tests could
>>>>>>>>> lead to
>>>>>>>>>>> missing errors.
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> On Fri, Mar 12, 2021 at 12:57 PM Jean-Baptiste Onofre 
>>>>>>>>>>> <[email protected]>
>>>>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> I’m adding these tests to be fixed/improved:
>>>>>>>>>>>> 
>>>>>>>>>>>> FailoverDurableSubTransactionTest.testFailoverCommitListener
>>>>>>>>>>>> DurableSyncNetworkBridgeTest.testRemoveSubscriptionPropagate
>>>>>>>>>>>> DurableSyncNetworkBridgeTest.testRemoveSubscriptionWithBridgeOffline
>>>>>>>>>>>> 
>>>>>>>>>>>> Let me create the Jira and create a PR to exclude the tests and 
>>>>>>>>>>>> verify
>>>>>>>>>>>> Jenkins is happy.
>>>>>>>>>>>> 
>>>>>>>>>>>> Regards
>>>>>>>>>>>> JB
>>>>>>>>>>>> 
>>>>>>>>>>>>> Le 12 mars 2021 à 16:14, Jonathan Gallimore <
>>>>>>>>>>>> [email protected]> a écrit :
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I'm +1 on the actions :).
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Jon
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Fri, Mar 12, 2021 at 3:11 PM Jean-Baptiste Onofre 
>>>>>>>>>>>>> <[email protected]
>>>>>>>>>> 
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Sure, thanks for the help !
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Just waiting for some feedback before starting the "actions" ;)
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Regards
>>>>>>>>>>>>>> JB
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Le 12 mars 2021 à 14:29, Jonathan Gallimore <
>>>>>>>>>>>>>> [email protected]> a écrit :
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> I ran into this test failing yesterday:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>> activemq-unit-tests/src/test/java/org/apache/activemq/usecases/DuplexAdvisoryRaceTest.java
>>>>>>>>>>>>>>> - I'd be happy to try and contribute a fix. Would you like to 
>>>>>>>>>>>>>>> assign
>>>>>>>>>>>> the
>>>>>>>>>>>>>>> JIRA to me?
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Jon
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On Fri, Mar 12, 2021 at 12:58 PM Jean-Baptiste Onofre <
>>>>>>>>>>>> [email protected]>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Hi guys,
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Now that we have Jenkinsfile in our repo, and we use Jenkins
>>>>>>>>>>>> pipeline,
>>>>>>>>>>>>>> we
>>>>>>>>>>>>>>>> dramatically improved our build: the build is executed for each
>>>>>>>>>>>>>>>> PullRequests or commit on the main branch.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> However, we have lot of failing tests, causing quite 
>>>>>>>>>>>>>>>> systematically
>>>>>>>>>>>> the
>>>>>>>>>>>>>>>> build failing on ci-builds.apache.org.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> We really need to have a clean, accurate and stable build: it 
>>>>>>>>>>>>>>>> will
>>>>>>>>>>>>>> improve
>>>>>>>>>>>>>>>> the issue detection and simplify the review, especially for
>>>>>>>>>>>>>> PullRequests.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> I ran several builds on my machine (with different docker
>>>>>>>>> containers)
>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>> I already identified some failing/flaky tests:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> -
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>> activemq-leveldb-store/src/test/java/org/apache/activemq/leveldb/test/ElectingLevelDBStoreTest.java
>>>>>>>>>>>>>>>> is not a big deal as I have a PR removing leveled completely
>>>>>>>>>>>>>>>> -
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>> activemq-stomp/src/test/java/org/apache/activemq/transport/stomp/Stomp11NIOSSLTest.java.
>>>>>>>>>>>>>>>> Chris did an improvement, but I still have some flakiness here.
>>>>>>>>>>>>>>>> -
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>> activemq-unit-tests/src/test/java/org/apache/activemq/usecases/DuplexAdvisoryRaceTest.java
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> I propose the following action plan:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 1. Create the Jira for each failing/flaky tests
>>>>>>>>>>>>>>>> 2. Exclude the tests (in surefire plugin configuration) to 
>>>>>>>>>>>>>>>> have a
>>>>>>>>>>>> "green
>>>>>>>>>>>>>>>> light" on Jenkins.
>>>>>>>>>>>>>>>> 3. For each Jira, we work on a PullRequest, to be sure that 
>>>>>>>>>>>>>>>> Jenkins
>>>>>>>>>>>> is
>>>>>>>>>>>>>>>> still "happy".
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Anyone willing to help on (3) is welcome !
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> If there’s no objection, I will start with (1) and (2).
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>> Regards
>>>>>>>>>>>>>>>> JB
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> Clebert Suconic
>>>>> 
>>>> 
>>> 
>> 
>

Re: [PROPOSAL] Improve ActiveMQ 5 build stability

Reply via email to