Hi Vratko, some update on OpenFlow cluster issues: > 1) table miss flow only pushed by 1 instance (new bug): > https://bugs.opendaylight.org/show_bug.cgi?id=7770 > <https://bugs.opendaylight.org/show_bug.cgi?id=7770> There is already a candidate fix.
> 2) restart of device owner in non-HA scenarios does not work (old bug): > https://bugs.opendaylight.org/show_bug.cgi?id=6459 > <https://bugs.opendaylight.org/show_bug.cgi?id=6459> This issue will be addressed in this other bug: https://bugs.opendaylight.org/show_bug.cgi?id=7763 > 3) Openflow cluster performance issues (old bug): > https://bugs.opendaylight.org/show_bug.cgi?id=6755 > <https://bugs.opendaylight.org/show_bug.cgi?id=6755> I opened bug to controller project to better understand the log ERRORs: https://bugs.opendaylight.org/show_bug.cgi?id=7901 <https://bugs.opendaylight.org/show_bug.cgi?id=7901> Unfortunately last week we realized of a new issue in carbon: https://bugs.opendaylight.org/show_bug.cgi?id=7884 <https://bugs.opendaylight.org/show_bug.cgi?id=7884> BR/Luis > On Feb 9, 2017, at 5:32 PM, Luis Gomez <ece...@gmail.com> wrote: > > Hi Vratko, > > I investigated the issue I commented to you and created a bug for it, > currently we have these cluster related bugs in OpenFlow identified by the > system test (there could be more): > > 1) table miss flow only pushed by 1 instance (new bug): > https://bugs.opendaylight.org/show_bug.cgi?id=7770 > <https://bugs.opendaylight.org/show_bug.cgi?id=7770> > 2) restart of device owner in non-HA scenarios does not work (old bug): > https://bugs.opendaylight.org/show_bug.cgi?id=6459 > <https://bugs.opendaylight.org/show_bug.cgi?id=6459> > 3) Openflow cluster performance issues (old bug): > https://bugs.opendaylight.org/show_bug.cgi?id=6755 > <https://bugs.opendaylight.org/show_bug.cgi?id=6755> > > As you said it is unclear whether openflow cluster issues are openflow or > cluster related, all bugs are now in openflow queue and I would expect > openflow devs to move to cluster queue if that is where they belong to. > > BR/Luis > > >> On Feb 7, 2017, at 10:34 AM, Luis Gomez <ece...@gmail.com >> <mailto:ece...@gmail.com>> wrote: >> >> >>> On Feb 7, 2017, at 10:03 AM, Vratko Polak -X (vrpolak - PANTHEON >>> TECHNOLOGIES at Cisco) <vrpo...@cisco.com <mailto:vrpo...@cisco.com>> wrote: >>> >>> Two more questions. >>> >>> > https://jenkins.opendaylight.org/releng/view/CSIT-3node/job/openflowplugin-csit-3node-clustering-only-boron/ >>> > >>> > <https://jenkins.opendaylight.org/releng/view/CSIT-3node/job/openflowplugin-csit-3node-clustering-only-boron/> >>> > Cluster non HA test >>> >>> I just realized 1) and 2) are the same job. >>> I am not sure which of the six suites [1] >>> are you referring to. >> >> Typo, this is the link for non-HA: >> https://jenkins.opendaylight.org/releng/view/openflowplugin/job/openflowplugin-csit-3node-periodic-bulkomatic-clustering-daily-only-boron/ >> >> <https://jenkins.opendaylight.org/releng/view/openflowplugin/job/openflowplugin-csit-3node-periodic-bulkomatic-clustering-daily-only-boron/> >>> >>> >> but other tests are not, I will have to investigate this. >>> > >>> > Keep us informed. >>> >>> Do you have an ETA? >> >> I would say in the next 2 weeks I will have something in place for cluster >> scalability. >> >>> >>> Vratko. >>> >>> [1] >>> https://logs.opendaylight.org/releng/jenkins092/openflowplugin-csit-3node-clustering-only-carbon/470/archives/log.html.gz >>> >>> <https://logs.opendaylight.org/releng/jenkins092/openflowplugin-csit-3node-clustering-only-carbon/470/archives/log.html.gz> >>> >>> From: Vratko Polak -X (vrpolak - PANTHEON TECHNOLOGIES at Cisco) >>> Sent: 7 February, 2017 15:05 >>> To: 'Luis Gomez' <ece...@gmail.com <mailto:ece...@gmail.com>> >>> Cc: integration-...@lists.opendaylight.org >>> <mailto:integration-...@lists.opendaylight.org>; >>> controller-dev@lists.opendaylight.org >>> <mailto:controller-dev@lists.opendaylight.org>; openflowplugin-dev >>> <openflowplugin-...@lists.opendaylight.org >>> <mailto:openflowplugin-...@lists.opendaylight.org>> >>> Subject: RE: [integration-dev] Clustering acceptance tests >>> >>> Thanks Luis. >>> >>> > but other tests are not, I will have to investigate this. >>> >>> Keep us informed. >>> >>> > 3) & 4) is probably controller cluster limitation. >>> >>> Both jobs occasionally pass, >>> and I have opened a Bug [0] for exceptions in karaf log. >>> To me, it looks like an error in OpenflowPlugin >>> (as opposed to Controller) code. >>> >>> > writing very fast (REST or internal app) on a shard follower DS, and >>> > reading on the other follower. >>> >>> We plan to expand controller-csit-3node-rest-clust-cars-perf-only-carbon, >>> not sure yet whether this scenario will be included. >>> >>> Vratko. >>> >>> [0] https://bugs.opendaylight.org/show_bug.cgi?id=7750 >>> <https://bugs.opendaylight.org/show_bug.cgi?id=7750> >>> >>> From: Luis Gomez [mailto:ece...@gmail.com <mailto:ece...@gmail.com>] >>> Sent: 7 February, 2017 08:35 >>> To: Vratko Polak -X (vrpolak - PANTHEON TECHNOLOGIES at Cisco) >>> <vrpo...@cisco.com <mailto:vrpo...@cisco.com>> >>> Cc: integration-...@lists.opendaylight.org >>> <mailto:integration-...@lists.opendaylight.org>; >>> controller-dev@lists.opendaylight.org >>> <mailto:controller-dev@lists.opendaylight.org>; openflowplugin-dev >>> <openflowplugin-...@lists.opendaylight.org >>> <mailto:openflowplugin-...@lists.opendaylight.org>> >>> Subject: Re: [integration-dev] Clustering acceptance tests >>> >>> Here is what I know from OpenFlow plugin (cc-ing ofplugin devs): >>> >>> * Does your project have a test plan mentioning specific cluster scenarios? >>> >>> Not written test plan but we are running a bunch of cluster tests. >>> >>> >>> * Do you have any of such scenarios implemented as Robot suites? >>> >>> 1) >>> https://jenkins.opendaylight.org/releng/view/CSIT-3node/job/openflowplugin-csit-3node-clustering-only-boron/ >>> >>> <https://jenkins.opendaylight.org/releng/view/CSIT-3node/job/openflowplugin-csit-3node-clustering-only-boron/> >>> -> Cluster HA test (DPN connect to all nodes), it used to pass except for >>> 1 test (member isolation with iptables), now I see this test is stable but >>> other tests are not, I will have to investigate this. >>> >>> 2) >>> https://jenkins.opendaylight.org/releng/view/CSIT-3node/job/openflowplugin-csit-3node-clustering-only-boron/ >>> >>> <https://jenkins.opendaylight.org/releng/view/CSIT-3node/job/openflowplugin-csit-3node-clustering-only-boron/> >>> -> Cluster non HA test (DPN connect to 1 node), failing because this old >>> bug: https://bugs.opendaylight.org/show_bug.cgi?id=6459 >>> <https://bugs.opendaylight.org/show_bug.cgi?id=6459>. >>> >>> 3) >>> https://jenkins.opendaylight.org/releng/view/CSIT-3node/job/openflowplugin-csit-3node-periodic-bulkomatic-clustering-perf-daily-only-boron/ >>> >>> <https://jenkins.opendaylight.org/releng/view/CSIT-3node/job/openflowplugin-csit-3node-periodic-bulkomatic-clustering-perf-daily-only-boron/> >>> -> Max flows/sec using bulk-o-matic DS on cluster setup. Not fully working >>> because some cluster backend limitation >>> https://bugs.opendaylight.org/show_bug.cgi?id=6755 >>> <https://bugs.opendaylight.org/show_bug.cgi?id=6755> >>> >>> 4) >>> https://jenkins.opendaylight.org/releng/view/CSIT-3node/job/openflowplugin-csit-3node-periodic-restconf-clustering-perf-daily-only-boron/ >>> >>> <https://jenkins.opendaylight.org/releng/view/CSIT-3node/job/openflowplugin-csit-3node-periodic-restconf-clustering-perf-daily-only-boron/> >>> -> Max flows/sec using NB REST on cluster setup, this never worked very >>> good because previous bug. >>> >>> * Do the robot suites have failures, suspected to be caused by clustering >>> (as opposed to application logic, or mistakes in Robot code)? >>> >>> So far I think issue in 2) is OpenFlow cluster implementation and issue in >>> 3) & 4) is probably controller cluster limitation. >>> >>> >>> * Are there open Bugs corresponding to the clustering failures? >>> >>> Yes, except for 1) that will require some analysis on the unstable tests. >>> >>> >>> * Are you planning to implement more Robot 3node suites until Carbon >>> release? >>> >>> I will probably replace 1 of the performance suites (no point to run 2 if >>> they do not work) by a cluster switch scalability test. >>> >>> >>> * Are there scenarios you would like Controller team to cover using mock >>> apps? >>> >>> I think issue in 3) & 4) could be reproduced in controller project by just >>> writing very fast (REST or internal app) on a shard follower DS, and >>> reading on the other follower. >>> >>> On Feb 6, 2017, at 5:31 AM, Vratko Polak -X (vrpolak - PANTHEON >>> TECHNOLOGIES at Cisco) <vrpo...@cisco.com <mailto:vrpo...@cisco.com>> wrote: >>> >>> Hello Test Contacts. >>> >>> In Controller project, our highest priority >>> for Carbon release is to make sure ODL clustering >>> is usable and stable. >>> >>> We are in the phase of formulating explicit acceptance criteria, >>> so we can create execution plan for turning them into Robot suites. >>> >>> Of course, clustering is not very useful just by itself, >>> it is used as a tool applications can use to reach their goals. >>> So real acceptance criteria for clustering should also >>> take into account whether ODL applications can work in cluster. >>> >>> Many projects are already running their 3node CSIT tests, >>> but on one hand, some important scenarios might be not covered yet, >>> and some suites might be too unstable to serve as acceptance tests. >>> >>> Controller team is small and busy, so we are asking for help. >>> Here is a set of quick questions for test contacts: >>> * Does your project have a test plan mentioning specific cluster scenarios? >>> * Do you have any of such scenarios implemented as Robot suites? >>> * Do the robot suites have failures, suspected to be caused by clustering >>> (as opposed to application logic, or mistakes in Robot code)? >>> * Are there open Bugs corresponding to the clustering failures? >>> * Are you planning to implement more Robot 3node suites until Carbon >>> release? >>> * Are there scenarios you would like Controller team to cover using mock >>> apps? >>> >>> Vratko (as a Controller test contact). >>> _______________________________________________ >>> integration-dev mailing list >>> integration-...@lists.opendaylight.org >>> <mailto:integration-...@lists.opendaylight.org> >>> https://lists.opendaylight.org/mailman/listinfo/integration-dev >>> <https://lists.opendaylight.org/mailman/listinfo/integration-dev> >
_______________________________________________ controller-dev mailing list controller-dev@lists.opendaylight.org https://lists.opendaylight.org/mailman/listinfo/controller-dev