Adding integration-dev as I don't think this is an infra issue. I did some
triaging and found the following:

1. Robot vm is still running and ssh is still accessible when failure occurs
2. CPU / RAM / Storage all sufficient during failure

What is happening however is something is causing the Jenkins Java SSH
connection to close at exactly 14 minutes into the job every time and thus
causing Jenkins to believe the VM is now no longer reachable.

I'm suspicious that something is happening in the robot run that is
breaking the Jenkins SSH connection. Has any other projects seen this same
failure in their CSIT jobs too?

Regards,
Thanh


On Tue, Nov 6, 2018 at 8:53 AM Thanh Ha <[email protected]>
wrote:

> Hi Lori,
>
> Sounds like a problem that might be difficult to sort out but I'll poke at
> this today and see if I can find some clues.
>
> Regards,
> Thanh
>
> On Mon, Nov 5, 2018 at 4:51 PM Lori Jakab <[email protected]>
> wrote:
>
>> [adding helpdesk, not sure who is monitoring infrastructure@]
>>
>> On Fri, Nov 2, 2018 at 2:02 PM Lori Jakab <[email protected]>
>> wrote:
>> >
>> > Hi,
>> >
>> > For a while the lispflowmapping performance tests on Jenkins have been
>> > failing, first intermittently, but now the Neon and Oxygen tests fail
>> > almost always:
>> >
>> >
>> https://jenkins.opendaylight.org/releng/view/lispflowmapping/job/lispflowmapping-csit-1node-performance-only-neon/
>> >
>> https://jenkins.opendaylight.org/releng/view/lispflowmapping/job/lispflowmapping-csit-1node-performance-only-fluorine/
>> >
>> https://jenkins.opendaylight.org/releng/view/lispflowmapping/job/lispflowmapping-csit-1node-performance-only-oxygen/
>> >
>> > This is the error message that I found most likely to be useful:
>> > "Caused: java.io.IOException: Backing channel
>> > 'prd-centos7-robot-2c-8g-42785' is disconnected." see the bottom of
>> > the full console log:
>> >
>> >
>> https://jenkins.opendaylight.org/releng/view/lispflowmapping/job/lispflowmapping-csit-1node-performance-only-neon/90/console
>> >
>> > Performance jobs use two VMs for the tests, and it looks like during
>> > the tests the connection from the main
>
> Request Test Traffic                                     FATAL: command
>> execution failed
>
> java.io.EOFException
>
>  VM to the slave is broken. I
>> > couldn't find any clues for the root of the problem in these logs.
>> >
>> > Any ideas on how to fix this? Unless the problem is fixed, these tests
>> > just waste infra resources, so the sensible thing to do would be to
>> > disable them, which is not the outcome I would prefer. The only other
>> > project that seems to still have performance tests is SXP, their tests
>> > at least finish, but not without failures, so I don't know how much
>> > they are affected by this issue. MDSAL used to have performance tests
>> > too, but I cant find them anymore.
>> >
>> > -Lori
>> _______________________________________________
>> infrastructure mailing list
>> [email protected]
>> https://lists.opendaylight.org/mailman/listinfo/infrastructure
>>
>
_______________________________________________
infrastructure mailing list
[email protected]
https://lists.opendaylight.org/mailman/listinfo/infrastructure

Reply via email to