On 11/6/18 6:00 PM, Jamo Luhrsen wrote: > > > On 11/5/18 6:31 PM, Thanh Ha wrote: >> Adding integration-dev as I don't think this is an infra issue. I did some >> triaging and found the following: >> >> 1. Robot vm is still running and ssh is still accessible when failure occurs >> 2. CPU / RAM / Storage all sufficient during failure >> >> What is happening however is something is causing the Jenkins Java SSH >> connection to close at exactly 14 minutes into >> the job every time and thus causing Jenkins to believe the VM is now no >> longer reachable. >> >> I'm suspicious that something is happening in the robot run that is breaking >> the Jenkins SSH connection. Has any other >> projects seen this same failure in their CSIT jobs too? > > > Yeah, I saw this is a job and dismissed it as an infra instability and > didn't look any deeper. I think it was a 3node netvirt csit job. > > I will keep an eye out for it happening more.
here it is in a 1node controller CSIT job: https://jenkins.opendaylight.org/releng/job/controller-csit-1node-benchmark-all-fluorine/213/console JamO > JamO > > > >> Regards, >> Thanh >> >> >> On Tue, Nov 6, 2018 at 8:53 AM Thanh Ha <[email protected] >> <mailto:[email protected]>> wrote: >> >> Hi Lori, >> >> Sounds like a problem that might be difficult to sort out but I'll poke >> at this today and see if I can find some >> clues. >> >> Regards, >> Thanh >> >> On Mon, Nov 5, 2018 at 4:51 PM Lori Jakab <[email protected] >> <mailto:lorand.jakab%[email protected]>> wrote: >> >> [adding helpdesk, not sure who is monitoring infrastructure@] >> >> On Fri, Nov 2, 2018 at 2:02 PM Lori Jakab >> <[email protected] <mailto:lorand.jakab%[email protected]>> >> wrote: >> > >> > Hi, >> > >> > For a while the lispflowmapping performance tests on Jenkins have >> been >> > failing, first intermittently, but now the Neon and Oxygen tests >> fail >> > almost always: >> > >> > >> >> https://jenkins.opendaylight.org/releng/view/lispflowmapping/job/lispflowmapping-csit-1node-performance-only-neon/ >> > >> >> https://jenkins.opendaylight.org/releng/view/lispflowmapping/job/lispflowmapping-csit-1node-performance-only-fluorine/ >> > >> >> https://jenkins.opendaylight.org/releng/view/lispflowmapping/job/lispflowmapping-csit-1node-performance-only-oxygen/ >> > >> > This is the error message that I found most likely to be useful: >> > "Caused: java.io.IOException: Backing channel >> > 'prd-centos7-robot-2c-8g-42785' is disconnected." see the bottom >> of >> > the full console log: >> > >> > >> >> https://jenkins.opendaylight.org/releng/view/lispflowmapping/job/lispflowmapping-csit-1node-performance-only-neon/90/console >> >> >> > >> > Performance jobs use two VMs for the tests, and it looks like >> during >> > the tests the connection from the main >> >> Request Test Traffic FATAL: >> command execution failed >> >> java.io.EOFException >> >> VM to the slave is broken. I >> > couldn't find any clues for the root of the problem in these logs. >> > >> > Any ideas on how to fix this? Unless the problem is fixed, these >> tests >> > just waste infra resources, so the sensible thing to do would be >> to >> > disable them, which is not the outcome I would prefer. The only >> other >> > project that seems to still have performance tests is SXP, their >> tests >> > at least finish, but not without failures, so I don't know how >> much >> > they are affected by this issue. MDSAL used to have performance >> tests >> > too, but I cant find them anymore. >> > >> > -Lori >> _______________________________________________ >> infrastructure mailing list >> [email protected] >> <mailto:[email protected]> >> https://lists.opendaylight.org/mailman/listinfo/infrastructure >> >> >> _______________________________________________ >> integration-dev mailing list >> [email protected] >> https://lists.opendaylight.org/mailman/listinfo/integration-dev >> _______________________________________________ infrastructure mailing list [email protected] https://lists.opendaylight.org/mailman/listinfo/infrastructure
