On 11/8/18 10:14 AM, Jamo Luhrsen wrote:
> 
> 
> On 11/6/18 6:00 PM, Jamo Luhrsen wrote:
>>
>>
>> On 11/5/18 6:31 PM, Thanh Ha wrote:
>>> Adding integration-dev as I don't think this is an infra issue. I did some 
>>> triaging and found the following:
>>>
>>> 1. Robot vm is still running and ssh is still accessible when failure occurs
>>> 2. CPU / RAM / Storage all sufficient during failure
>>>
>>> What is happening however is something is causing the Jenkins Java SSH 
>>> connection to close at exactly 14 minutes into 
>>> the job every time and thus causing Jenkins to believe the VM is now no 
>>> longer reachable.
>>>
>>> I'm suspicious that something is happening in the robot run that is 
>>> breaking the Jenkins SSH connection. Has any 
>>> other projects seen this same failure in their CSIT jobs too?
>>
>>
>> Yeah, I saw this is a job and dismissed it as an infra instability and
>> didn't look any deeper. I think it was a 3node netvirt csit job.
>>
>> I will keep an eye out for it happening more.
> 
> here it is in a 1node controller CSIT job:
> https://jenkins.opendaylight.org/releng/job/controller-csit-1node-benchmark-all-fluorine/213/console

another:
https://jenkins.opendaylight.org/releng/job/controller-csit-3node-rest-clust-cars-perf-tell-only-fluorine/100/console

JamO

> JamO
> 
> 
>> JamO
>>
>>
>>
>>> Regards,
>>> Thanh
>>>
>>>
>>> On Tue, Nov 6, 2018 at 8:53 AM Thanh Ha <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>>
>>>     Hi Lori,
>>>
>>>     Sounds like a problem that might be difficult to sort out but I'll poke 
>>> at this today and see if I can find some 
>>> clues.
>>>
>>>     Regards,
>>>     Thanh
>>>
>>>     On Mon, Nov 5, 2018 at 4:51 PM Lori Jakab <[email protected] 
>>> <mailto:lorand.jakab%[email protected]>> wrote:
>>>
>>>         [adding helpdesk, not sure who is monitoring infrastructure@]
>>>
>>>         On Fri, Nov 2, 2018 at 2:02 PM Lori Jakab 
>>> <[email protected] <mailto:lorand.jakab%[email protected]>> 
>>> wrote:
>>>          >
>>>          > Hi,
>>>          >
>>>          > For a while the lispflowmapping performance tests on Jenkins 
>>> have been
>>>          > failing, first intermittently, but now the Neon and Oxygen tests 
>>> fail
>>>          > almost always:
>>>          >
>>>          >
>>> https://jenkins.opendaylight.org/releng/view/lispflowmapping/job/lispflowmapping-csit-1node-performance-only-neon/
>>>          >
>>> https://jenkins.opendaylight.org/releng/view/lispflowmapping/job/lispflowmapping-csit-1node-performance-only-fluorine/
>>>          >
>>> https://jenkins.opendaylight.org/releng/view/lispflowmapping/job/lispflowmapping-csit-1node-performance-only-oxygen/
>>>          >
>>>          > This is the error message that I found most likely to be useful:
>>>          > "Caused: java.io.IOException: Backing channel
>>>          > 'prd-centos7-robot-2c-8g-42785' is disconnected." see the bottom 
>>> of
>>>          > the full console log:
>>>          >
>>>          >
>>> https://jenkins.opendaylight.org/releng/view/lispflowmapping/job/lispflowmapping-csit-1node-performance-only-neon/90/console
>>>  
>>>
>>>          >
>>>          > Performance jobs use two VMs for the tests, and it looks like 
>>> during
>>>          > the tests the connection from the main
>>>
>>>         Request Test Traffic                                     FATAL: 
>>> command execution failed
>>>
>>>         java.io.EOFException
>>>
>>>           VM to the slave is broken. I
>>>          > couldn't find any clues for the root of the problem in these 
>>> logs.
>>>          >
>>>          > Any ideas on how to fix this? Unless the problem is fixed, these 
>>> tests
>>>          > just waste infra resources, so the sensible thing to do would be 
>>> to
>>>          > disable them, which is not the outcome I would prefer. The only 
>>> other
>>>          > project that seems to still have performance tests is SXP, their 
>>> tests
>>>          > at least finish, but not without failures, so I don't know how 
>>> much
>>>          > they are affected by this issue. MDSAL used to have performance 
>>> tests
>>>          > too, but I cant find them anymore.
>>>          >
>>>          > -Lori
>>>         _______________________________________________
>>>         infrastructure mailing list
>>>         [email protected] 
>>> <mailto:[email protected]>
>>>         https://lists.opendaylight.org/mailman/listinfo/infrastructure
>>>
>>>
>>> _______________________________________________
>>> integration-dev mailing list
>>> [email protected]
>>> https://lists.opendaylight.org/mailman/listinfo/integration-dev
>>>

_______________________________________________
infrastructure mailing list
[email protected]
https://lists.opendaylight.org/mailman/listinfo/infrastructure

Reply via email to