I think NetVirt can sign off on the clustering issues. Let Jamo take a look when he is up and also sign off. The two jobs [2] and [2] are showing better results.
Sometimes there are random failures where a node does not come back properly such as in job [4]. We try to bring ODL1 back into the cluster but it fails to come back within 5 minutes. Then we move to the next tests and they fail. That ODL1 is hitting the below issue. Is there anything we can do to get past that? We can increase the timeout but why is the cluster in a bad shape? I don't think the infra is loaded since everything else is moving along properly - the robot vm is driving the other two nodes. We can also see odl1 restarting but taking it's time in the failing case. NetVirt is hitting the issue in bug-9006. NetVirt tests copied the openflowplugin test pattern to take a node down and bring it back. Then wait 5 minutes. What I don't understand is why taking 1 node down out of the three leads to instability? We have three nodes in the cluster. Take 1 down leave other 2 alone. Attempt to bring back the 1 node, wait 5 minutes, that fails and now the cluster is in a bad state causing the further tests to fail. 2017-08-25 02:02:38,430 | WARN | saction-32-34'}} | DeadlockMonitor | 126 - org.opendaylight.controller.config-manager - 0.6.2.SNAPSHOT | ModuleIdentifier{factoryName='runtime-generated-mapping', instanceName='runtime-mapping-singleton'} did not finish after 284864 ms [2] https://jenkins.opendaylight.org/releng/user/shague/my-views/view/3node/job/netvirt-csit-3node-openstack-ocata-gate-stateful-carbon/25/ [3] https://jenkins.opendaylight.org/releng/user/shague/my-views/view/3node/job/netvirt-csit-3node-openstack-ocata-upstream-stateful-carbon/ [4] https://logs.opendaylight.org/releng/jenkins092/netvirt-csit-3node-openstack-ocata-gate-stateful-carbon/24/log.html.gz#s1-s1-t13-k2-k2-k8 [5] https://git.opendaylight.org/gerrit/62256 On Thu, Aug 24, 2017 at 7:38 PM, Sam Hague <sha...@redhat.com> wrote: > I am running some more tests now with the NetVirt CSIT that look promising > so it might not be a blocker for NetVirt. I am running a few more > iterations now to know better. > > I had reduced the number of test suites so that we could capture just > clustering issues. Doing so did add a bug in the test code that was causing > some extra failures. I have that fixed. If the next runs show we are back > down to just a few clustering bugs then we don't need a blocker from the > NetVirt side. If lucky the remaining issues are what is in this > openflowplugin bug here. > > For reference the last run [1] is looking better. It has a different test > code bug in so ignore those, but if you can check the karaf.logs and see if > you see any clustering issues. I don't think you will since the killing of > the ODL nodes is broken in this job. 21 and 21 should have that fixed. > > [1] https://logs.opendaylight.org/releng/jenkins092/netvirt- > csit-3node-openstack-ocata-gate-stateful-carbon/20/ > > On Thu, Aug 24, 2017 at 7:22 PM, Robert Varga <n...@hq.sk> wrote: > >> On 24/08/17 22:07, bugzilla-dae...@bugs.opendaylight.org wrote: >> > *Comment # 14 <https://bugs.opendaylight.org/show_bug.cgi?id=9006#c14> >> > on bug 9006 <https://bugs.opendaylight.org/show_bug.cgi?id=9006> from >> > Luis Gomez <mailto:ece...@gmail.com> * >> > >> > OK, I think as next step I can try to see if this reproduces outside CI. >> >> +infrastructure >> >> Guys, we are dealing with an issue which was first reported on >> 8/17/2017, is blocking Carbon SR2 (due to netvirt CSIT failing) and can >> either be an infra or code problem. >> >> Suspect trigger is the fix for >> https://bugs.opendaylight.org/show_bug.cgi?id=8941 (merged of >> 8/12/2017), which is a Carbon -> Carbon SR1 memory leak regression. If >> that is the case, we need to identify and fix it, as a revert is not >> really an option. >> >> Carbon/Nitrogen are synced up w.r.t. CDS, so this also impacts Nitrogen >> (where it is a Carbon -> Nitrogen regression). >> >> Can you check with RS if there are any issues and/or the public cloud is >> experiencing issues? >> >> Given that inter-node network stability is in question, can we get a >> limited-use set of slaves in the private cloud? Whatever is needed for >> netvirt CSIT is sufficient, and we only need to spin it up when we need >> a really predictable environment... Should I file a helpdesk ticket? >> >> Thanks, >> Robert >> >> >> _______________________________________________ >> infrastructure mailing list >> infrastructure@lists.opendaylight.org >> https://lists.opendaylight.org/mailman/listinfo/infrastructure >> >> >
_______________________________________________ infrastructure mailing list infrastructure@lists.opendaylight.org https://lists.opendaylight.org/mailman/listinfo/infrastructure