Re: [controller-dev] [release] Autorelease oxygen failed to build sal-cluster-admin-impl from controller

Thanh Ha Fri, 20 Jul 2018 07:37:12 -0700

On Fri, Jul 20, 2018 at 10:01 AM Tom Pantelis <tompante...@gmail.com> wrote:

> On Fri, Jul 20, 2018 at 4:48 AM, Anil Belur <abe...@linuxfoundation.org>
> wrote:
>
>> On Fri, Jul 20, 2018 at 11:12 AM Jenkins <
>> jenkins-dontre...@opendaylight.org> wrote:
>>
>>> Attention controller-devs,
>>>
>>> Autorelease oxygen failed to build sal-cluster-admin-impl from
>>> controller in build
>>> 359. Attached is a snippet of the error message related to the
>>> failure that we were able to automatically parse as well as console
>>> logs.
>>>
>>> Console Logs:
>>>
>>> https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/autorelease-release-oxygen/359
>>>
>>> Jenkins Build:
>>>
>>> https://jenkins.opendaylight.org/releng/job/autorelease-release-oxygen/359/
>>>
>>> Please review and provide an ETA on when a fix will be available.
>>>
>>> Thanks,
>>> ODL releng/autorelease team
>>>
>>  Hello controller-dev:
>>
>> Please look into these failed tests.
>>
>> Failed tests:
>>
>> ClusterAdminRpcServiceTest.testFlipMemberVotingStates:976->lambda$testFlipMemberVotingStates$8:978
>> Expected leader member-1. Actual:
>> member-1-shard-cars-oper_testFlipMemberVotingStates
>>
>> Tests run: 17, Failures: 1, Errors: 0, Skipped: 0
>>
>
>
> I ran it successfully 500 times locally. But looking at the code and the
> test output from jenkins, I can see why it failed - just the right timing
> sequence coupled with just enough of a random thread execution delay and a
> deadline timeout set by the test being just a tad too low for that delay.
> I'll push a patch. Another case where occasionally it seems there's just
> enough of a slight delay or slowdown in the jenkins environment to throw
> off timing to cause a test failure.
>

Hi Tom,

I'm curious when you said you ran it successfully 500 times locally did you
perform a full build during that time or tested the single test case in
isolation?

I found that while troubleshooting the bgpcep issue in the bgp-bmp-mock
thread [0] that I had to run a full bgpcep build in order to reproduce the
issue on my own laptop system. I have a script that I'm testing now and
making it more generic that I will share to this list later which will
allow us to continuously run builds whether it's autorelease or project
specifc over and over infinitely and capture the maven output + surefire
logs output which I hope will help folks reproduce intermittent issues
locally.

I feel like blaming infrastructure being "slow" is too easy an excuse for
issues. If the software was run in a customer production environment I
suspect telling the customer that their hardware is too slow and is not the
same hardware as the developer's laptop it would not be a solution the
customer would be happy with.

I'm not sure what we can do to help give more confidence in the
infrastructure so that it's not the first thing that gets blamed every time
there's a build issue but we do run on build flavors in vexxhost that
provide dedicated CPUs and RAM to our builders. Once I have some more
validation on the infinite build script maybe I can run it for awhile on
every autorelease managed project and report to the projects with the
script output on my 2 laptops + a few vexxhost instances.

Regards,
Thanh

[0] https://lists.opendaylight.org/pipermail/release/2018-July/015594.html

_______________________________________________
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev

Re: [controller-dev] [release] Autorelease oxygen failed to build sal-cluster-admin-impl from controller

Reply via email to