Hi Vratko, I investigated the issue I commented to you and created a bug for it, currently we have these cluster related bugs in OpenFlow identified by the system test (there could be more):
1) table miss flow only pushed by 1 instance (new bug): https://bugs.opendaylight.org/show_bug.cgi?id=7770 2) restart of device owner in non-HA scenarios does not work (old bug): https://bugs.opendaylight.org/show_bug.cgi?id=6459 3) Openflow cluster performance issues (old bug): https://bugs.opendaylight.org/show_bug.cgi?id=6755 As you said it is unclear whether openflow cluster issues are openflow or cluster related, all bugs are now in openflow queue and I would expect openflow devs to move to cluster queue if that is where they belong to. BR/Luis > On Feb 7, 2017, at 10:34 AM, Luis Gomez <ece...@gmail.com> wrote: > > >> On Feb 7, 2017, at 10:03 AM, Vratko Polak -X (vrpolak - PANTHEON >> TECHNOLOGIES at Cisco) <vrpo...@cisco.com <mailto:vrpo...@cisco.com>> wrote: >> >> Two more questions. >> >> > https://jenkins.opendaylight.org/releng/view/CSIT-3node/job/openflowplugin-csit-3node-clustering-only-boron/ >> > >> > <https://jenkins.opendaylight.org/releng/view/CSIT-3node/job/openflowplugin-csit-3node-clustering-only-boron/> >> > Cluster non HA test >> >> I just realized 1) and 2) are the same job. >> I am not sure which of the six suites [1] >> are you referring to. > > Typo, this is the link for non-HA: > https://jenkins.opendaylight.org/releng/view/openflowplugin/job/openflowplugin-csit-3node-periodic-bulkomatic-clustering-daily-only-boron/ > > <https://jenkins.opendaylight.org/releng/view/openflowplugin/job/openflowplugin-csit-3node-periodic-bulkomatic-clustering-daily-only-boron/> >> >> >> but other tests are not, I will have to investigate this. >> > >> > Keep us informed. >> >> Do you have an ETA? > > I would say in the next 2 weeks I will have something in place for cluster > scalability. > >> >> Vratko. >> >> [1] >> https://logs.opendaylight.org/releng/jenkins092/openflowplugin-csit-3node-clustering-only-carbon/470/archives/log.html.gz >> >> <https://logs.opendaylight.org/releng/jenkins092/openflowplugin-csit-3node-clustering-only-carbon/470/archives/log.html.gz> >> >> From: Vratko Polak -X (vrpolak - PANTHEON TECHNOLOGIES at Cisco) >> Sent: 7 February, 2017 15:05 >> To: 'Luis Gomez' <ece...@gmail.com <mailto:ece...@gmail.com>> >> Cc: integration-...@lists.opendaylight.org >> <mailto:integration-...@lists.opendaylight.org>; >> controller-dev@lists.opendaylight.org >> <mailto:controller-dev@lists.opendaylight.org>; openflowplugin-dev >> <openflowplugin-...@lists.opendaylight.org >> <mailto:openflowplugin-...@lists.opendaylight.org>> >> Subject: RE: [integration-dev] Clustering acceptance tests >> >> Thanks Luis. >> >> > but other tests are not, I will have to investigate this. >> >> Keep us informed. >> >> > 3) & 4) is probably controller cluster limitation. >> >> Both jobs occasionally pass, >> and I have opened a Bug [0] for exceptions in karaf log. >> To me, it looks like an error in OpenflowPlugin >> (as opposed to Controller) code. >> >> > writing very fast (REST or internal app) on a shard follower DS, and >> > reading on the other follower. >> >> We plan to expand controller-csit-3node-rest-clust-cars-perf-only-carbon, >> not sure yet whether this scenario will be included. >> >> Vratko. >> >> [0] https://bugs.opendaylight.org/show_bug.cgi?id=7750 >> <https://bugs.opendaylight.org/show_bug.cgi?id=7750> >> >> From: Luis Gomez [mailto:ece...@gmail.com <mailto:ece...@gmail.com>] >> Sent: 7 February, 2017 08:35 >> To: Vratko Polak -X (vrpolak - PANTHEON TECHNOLOGIES at Cisco) >> <vrpo...@cisco.com <mailto:vrpo...@cisco.com>> >> Cc: integration-...@lists.opendaylight.org >> <mailto:integration-...@lists.opendaylight.org>; >> controller-dev@lists.opendaylight.org >> <mailto:controller-dev@lists.opendaylight.org>; openflowplugin-dev >> <openflowplugin-...@lists.opendaylight.org >> <mailto:openflowplugin-...@lists.opendaylight.org>> >> Subject: Re: [integration-dev] Clustering acceptance tests >> >> Here is what I know from OpenFlow plugin (cc-ing ofplugin devs): >> >> * Does your project have a test plan mentioning specific cluster scenarios? >> >> Not written test plan but we are running a bunch of cluster tests. >> >> >> * Do you have any of such scenarios implemented as Robot suites? >> >> 1) >> https://jenkins.opendaylight.org/releng/view/CSIT-3node/job/openflowplugin-csit-3node-clustering-only-boron/ >> >> <https://jenkins.opendaylight.org/releng/view/CSIT-3node/job/openflowplugin-csit-3node-clustering-only-boron/> >> -> Cluster HA test (DPN connect to all nodes), it used to pass except for >> 1 test (member isolation with iptables), now I see this test is stable but >> other tests are not, I will have to investigate this. >> >> 2) >> https://jenkins.opendaylight.org/releng/view/CSIT-3node/job/openflowplugin-csit-3node-clustering-only-boron/ >> >> <https://jenkins.opendaylight.org/releng/view/CSIT-3node/job/openflowplugin-csit-3node-clustering-only-boron/> >> -> Cluster non HA test (DPN connect to 1 node), failing because this old >> bug: https://bugs.opendaylight.org/show_bug.cgi?id=6459 >> <https://bugs.opendaylight.org/show_bug.cgi?id=6459>. >> >> 3) >> https://jenkins.opendaylight.org/releng/view/CSIT-3node/job/openflowplugin-csit-3node-periodic-bulkomatic-clustering-perf-daily-only-boron/ >> >> <https://jenkins.opendaylight.org/releng/view/CSIT-3node/job/openflowplugin-csit-3node-periodic-bulkomatic-clustering-perf-daily-only-boron/> >> -> Max flows/sec using bulk-o-matic DS on cluster setup. Not fully working >> because some cluster backend limitation >> https://bugs.opendaylight.org/show_bug.cgi?id=6755 >> <https://bugs.opendaylight.org/show_bug.cgi?id=6755> >> >> 4) >> https://jenkins.opendaylight.org/releng/view/CSIT-3node/job/openflowplugin-csit-3node-periodic-restconf-clustering-perf-daily-only-boron/ >> >> <https://jenkins.opendaylight.org/releng/view/CSIT-3node/job/openflowplugin-csit-3node-periodic-restconf-clustering-perf-daily-only-boron/> >> -> Max flows/sec using NB REST on cluster setup, this never worked very >> good because previous bug. >> >> * Do the robot suites have failures, suspected to be caused by clustering >> (as opposed to application logic, or mistakes in Robot code)? >> >> So far I think issue in 2) is OpenFlow cluster implementation and issue in >> 3) & 4) is probably controller cluster limitation. >> >> >> * Are there open Bugs corresponding to the clustering failures? >> >> Yes, except for 1) that will require some analysis on the unstable tests. >> >> >> * Are you planning to implement more Robot 3node suites until Carbon release? >> >> I will probably replace 1 of the performance suites (no point to run 2 if >> they do not work) by a cluster switch scalability test. >> >> >> * Are there scenarios you would like Controller team to cover using mock >> apps? >> >> I think issue in 3) & 4) could be reproduced in controller project by just >> writing very fast (REST or internal app) on a shard follower DS, and reading >> on the other follower. >> >> On Feb 6, 2017, at 5:31 AM, Vratko Polak -X (vrpolak - PANTHEON TECHNOLOGIES >> at Cisco) <vrpo...@cisco.com <mailto:vrpo...@cisco.com>> wrote: >> >> Hello Test Contacts. >> >> In Controller project, our highest priority >> for Carbon release is to make sure ODL clustering >> is usable and stable. >> >> We are in the phase of formulating explicit acceptance criteria, >> so we can create execution plan for turning them into Robot suites. >> >> Of course, clustering is not very useful just by itself, >> it is used as a tool applications can use to reach their goals. >> So real acceptance criteria for clustering should also >> take into account whether ODL applications can work in cluster. >> >> Many projects are already running their 3node CSIT tests, >> but on one hand, some important scenarios might be not covered yet, >> and some suites might be too unstable to serve as acceptance tests. >> >> Controller team is small and busy, so we are asking for help. >> Here is a set of quick questions for test contacts: >> * Does your project have a test plan mentioning specific cluster scenarios? >> * Do you have any of such scenarios implemented as Robot suites? >> * Do the robot suites have failures, suspected to be caused by clustering >> (as opposed to application logic, or mistakes in Robot code)? >> * Are there open Bugs corresponding to the clustering failures? >> * Are you planning to implement more Robot 3node suites until Carbon release? >> * Are there scenarios you would like Controller team to cover using mock >> apps? >> >> Vratko (as a Controller test contact). >> _______________________________________________ >> integration-dev mailing list >> integration-...@lists.opendaylight.org >> <mailto:integration-...@lists.opendaylight.org> >> https://lists.opendaylight.org/mailman/listinfo/integration-dev >> <https://lists.opendaylight.org/mailman/listinfo/integration-dev>
_______________________________________________ controller-dev mailing list controller-dev@lists.opendaylight.org https://lists.opendaylight.org/mailman/listinfo/controller-dev