Adding integration-dev as I don't think this is an infra issue. I did some triaging and found the following:
1. Robot vm is still running and ssh is still accessible when failure occurs 2. CPU / RAM / Storage all sufficient during failure What is happening however is something is causing the Jenkins Java SSH connection to close at exactly 14 minutes into the job every time and thus causing Jenkins to believe the VM is now no longer reachable. I'm suspicious that something is happening in the robot run that is breaking the Jenkins SSH connection. Has any other projects seen this same failure in their CSIT jobs too? Regards, Thanh On Tue, Nov 6, 2018 at 8:53 AM Thanh Ha <[email protected]> wrote: > Hi Lori, > > Sounds like a problem that might be difficult to sort out but I'll poke at > this today and see if I can find some clues. > > Regards, > Thanh > > On Mon, Nov 5, 2018 at 4:51 PM Lori Jakab <[email protected]> > wrote: > >> [adding helpdesk, not sure who is monitoring infrastructure@] >> >> On Fri, Nov 2, 2018 at 2:02 PM Lori Jakab <[email protected]> >> wrote: >> > >> > Hi, >> > >> > For a while the lispflowmapping performance tests on Jenkins have been >> > failing, first intermittently, but now the Neon and Oxygen tests fail >> > almost always: >> > >> > >> https://jenkins.opendaylight.org/releng/view/lispflowmapping/job/lispflowmapping-csit-1node-performance-only-neon/ >> > >> https://jenkins.opendaylight.org/releng/view/lispflowmapping/job/lispflowmapping-csit-1node-performance-only-fluorine/ >> > >> https://jenkins.opendaylight.org/releng/view/lispflowmapping/job/lispflowmapping-csit-1node-performance-only-oxygen/ >> > >> > This is the error message that I found most likely to be useful: >> > "Caused: java.io.IOException: Backing channel >> > 'prd-centos7-robot-2c-8g-42785' is disconnected." see the bottom of >> > the full console log: >> > >> > >> https://jenkins.opendaylight.org/releng/view/lispflowmapping/job/lispflowmapping-csit-1node-performance-only-neon/90/console >> > >> > Performance jobs use two VMs for the tests, and it looks like during >> > the tests the connection from the main > > Request Test Traffic FATAL: command >> execution failed > > java.io.EOFException > > VM to the slave is broken. I >> > couldn't find any clues for the root of the problem in these logs. >> > >> > Any ideas on how to fix this? Unless the problem is fixed, these tests >> > just waste infra resources, so the sensible thing to do would be to >> > disable them, which is not the outcome I would prefer. The only other >> > project that seems to still have performance tests is SXP, their tests >> > at least finish, but not without failures, so I don't know how much >> > they are affected by this issue. MDSAL used to have performance tests >> > too, but I cant find them anymore. >> > >> > -Lori >> _______________________________________________ >> infrastructure mailing list >> [email protected] >> https://lists.opendaylight.org/mailman/listinfo/infrastructure >> >
_______________________________________________ infrastructure mailing list [email protected] https://lists.opendaylight.org/mailman/listinfo/infrastructure
