On 11/5/18 6:31 PM, Thanh Ha wrote: > Adding integration-dev as I don't think this is an infra issue. I did some > triaging and found the following: > > 1. Robot vm is still running and ssh is still accessible when failure occurs > 2. CPU / RAM / Storage all sufficient during failure > > What is happening however is something is causing the Jenkins Java SSH > connection to close at exactly 14 minutes into > the job every time and thus causing Jenkins to believe the VM is now no > longer reachable. > > I'm suspicious that something is happening in the robot run that is breaking > the Jenkins SSH connection. Has any other > projects seen this same failure in their CSIT jobs too?
Yeah, I saw this is a job and dismissed it as an infra instability and didn't look any deeper. I think it was a 3node netvirt csit job. I will keep an eye out for it happening more. JamO > Regards, > Thanh > > > On Tue, Nov 6, 2018 at 8:53 AM Thanh Ha <[email protected] > <mailto:[email protected]>> wrote: > > Hi Lori, > > Sounds like a problem that might be difficult to sort out but I'll poke > at this today and see if I can find some clues. > > Regards, > Thanh > > On Mon, Nov 5, 2018 at 4:51 PM Lori Jakab <[email protected] > <mailto:lorand.jakab%[email protected]>> wrote: > > [adding helpdesk, not sure who is monitoring infrastructure@] > > On Fri, Nov 2, 2018 at 2:02 PM Lori Jakab <[email protected] > <mailto:lorand.jakab%[email protected]>> wrote: > > > > Hi, > > > > For a while the lispflowmapping performance tests on Jenkins have > been > > failing, first intermittently, but now the Neon and Oxygen tests > fail > > almost always: > > > > > > https://jenkins.opendaylight.org/releng/view/lispflowmapping/job/lispflowmapping-csit-1node-performance-only-neon/ > > > > https://jenkins.opendaylight.org/releng/view/lispflowmapping/job/lispflowmapping-csit-1node-performance-only-fluorine/ > > > > https://jenkins.opendaylight.org/releng/view/lispflowmapping/job/lispflowmapping-csit-1node-performance-only-oxygen/ > > > > This is the error message that I found most likely to be useful: > > "Caused: java.io.IOException: Backing channel > > 'prd-centos7-robot-2c-8g-42785' is disconnected." see the bottom of > > the full console log: > > > > > > https://jenkins.opendaylight.org/releng/view/lispflowmapping/job/lispflowmapping-csit-1node-performance-only-neon/90/console > > > > Performance jobs use two VMs for the tests, and it looks like > during > > the tests the connection from the main > > Request Test Traffic FATAL: > command execution failed > > java.io.EOFException > > VM to the slave is broken. I > > couldn't find any clues for the root of the problem in these logs. > > > > Any ideas on how to fix this? Unless the problem is fixed, these > tests > > just waste infra resources, so the sensible thing to do would be to > > disable them, which is not the outcome I would prefer. The only > other > > project that seems to still have performance tests is SXP, their > tests > > at least finish, but not without failures, so I don't know how much > > they are affected by this issue. MDSAL used to have performance > tests > > too, but I cant find them anymore. > > > > -Lori > _______________________________________________ > infrastructure mailing list > [email protected] > <mailto:[email protected]> > https://lists.opendaylight.org/mailman/listinfo/infrastructure > > > _______________________________________________ > integration-dev mailing list > [email protected] > https://lists.opendaylight.org/mailman/listinfo/integration-dev > _______________________________________________ infrastructure mailing list [email protected] https://lists.opendaylight.org/mailman/listinfo/infrastructure
