On Fri, Jul 20, 2018 at 10:01 AM Tom Pantelis <tompante...@gmail.com> wrote:
> On Fri, Jul 20, 2018 at 4:48 AM, Anil Belur <abe...@linuxfoundation.org> > wrote: > >> On Fri, Jul 20, 2018 at 11:12 AM Jenkins < >> jenkins-dontre...@opendaylight.org> wrote: >> >>> Attention controller-devs, >>> >>> Autorelease oxygen failed to build sal-cluster-admin-impl from >>> controller in build >>> 359. Attached is a snippet of the error message related to the >>> failure that we were able to automatically parse as well as console >>> logs. >>> >>> Console Logs: >>> >>> https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/autorelease-release-oxygen/359 >>> >>> Jenkins Build: >>> >>> https://jenkins.opendaylight.org/releng/job/autorelease-release-oxygen/359/ >>> >>> Please review and provide an ETA on when a fix will be available. >>> >>> Thanks, >>> ODL releng/autorelease team >>> >> Hello controller-dev: >> >> Please look into these failed tests. >> >> Failed tests: >> >> ClusterAdminRpcServiceTest.testFlipMemberVotingStates:976->lambda$testFlipMemberVotingStates$8:978 >> Expected leader member-1. Actual: >> member-1-shard-cars-oper_testFlipMemberVotingStates >> >> Tests run: 17, Failures: 1, Errors: 0, Skipped: 0 >> > > > I ran it successfully 500 times locally. But looking at the code and the > test output from jenkins, I can see why it failed - just the right timing > sequence coupled with just enough of a random thread execution delay and a > deadline timeout set by the test being just a tad too low for that delay. > I'll push a patch. Another case where occasionally it seems there's just > enough of a slight delay or slowdown in the jenkins environment to throw > off timing to cause a test failure. > Hi Tom, I'm curious when you said you ran it successfully 500 times locally did you perform a full build during that time or tested the single test case in isolation? I found that while troubleshooting the bgpcep issue in the bgp-bmp-mock thread [0] that I had to run a full bgpcep build in order to reproduce the issue on my own laptop system. I have a script that I'm testing now and making it more generic that I will share to this list later which will allow us to continuously run builds whether it's autorelease or project specifc over and over infinitely and capture the maven output + surefire logs output which I hope will help folks reproduce intermittent issues locally. I feel like blaming infrastructure being "slow" is too easy an excuse for issues. If the software was run in a customer production environment I suspect telling the customer that their hardware is too slow and is not the same hardware as the developer's laptop it would not be a solution the customer would be happy with. I'm not sure what we can do to help give more confidence in the infrastructure so that it's not the first thing that gets blamed every time there's a build issue but we do run on build flavors in vexxhost that provide dedicated CPUs and RAM to our builders. Once I have some more validation on the infinite build script maybe I can run it for awhile on every autorelease managed project and report to the projects with the script output on my 2 laptops + a few vexxhost instances. Regards, Thanh [0] https://lists.opendaylight.org/pipermail/release/2018-July/015594.html
_______________________________________________ controller-dev mailing list controller-dev@lists.opendaylight.org https://lists.opendaylight.org/mailman/listinfo/controller-dev