Hi Vratko, some update on OpenFlow cluster issues:

> 1) table miss flow only pushed by 1 instance (new bug): 
> https://bugs.opendaylight.org/show_bug.cgi?id=7770 
> <https://bugs.opendaylight.org/show_bug.cgi?id=7770>
There is already a candidate fix.

> 2) restart of device owner in non-HA scenarios does not work (old bug): 
> https://bugs.opendaylight.org/show_bug.cgi?id=6459 
> <https://bugs.opendaylight.org/show_bug.cgi?id=6459>
This issue will be addressed in this other bug: 
https://bugs.opendaylight.org/show_bug.cgi?id=7763

> 3) Openflow cluster performance issues (old bug): 
> https://bugs.opendaylight.org/show_bug.cgi?id=6755 
> <https://bugs.opendaylight.org/show_bug.cgi?id=6755>

I opened bug to controller project to better understand the log ERRORs: 
https://bugs.opendaylight.org/show_bug.cgi?id=7901 
<https://bugs.opendaylight.org/show_bug.cgi?id=7901>

Unfortunately last week we realized of a new issue in carbon: 
https://bugs.opendaylight.org/show_bug.cgi?id=7884 
<https://bugs.opendaylight.org/show_bug.cgi?id=7884>

BR/Luis


> On Feb 9, 2017, at 5:32 PM, Luis Gomez <ece...@gmail.com> wrote:
> 
> Hi Vratko,
> 
> I investigated the issue I commented to you and created a bug for it, 
> currently we have these cluster related bugs in OpenFlow identified by the 
> system test (there could be more):
> 
> 1) table miss flow only pushed by 1 instance (new bug): 
> https://bugs.opendaylight.org/show_bug.cgi?id=7770 
> <https://bugs.opendaylight.org/show_bug.cgi?id=7770>
> 2) restart of device owner in non-HA scenarios does not work (old bug): 
> https://bugs.opendaylight.org/show_bug.cgi?id=6459 
> <https://bugs.opendaylight.org/show_bug.cgi?id=6459>
> 3) Openflow cluster performance issues (old bug): 
> https://bugs.opendaylight.org/show_bug.cgi?id=6755 
> <https://bugs.opendaylight.org/show_bug.cgi?id=6755>
> 
> As you said it is unclear whether openflow cluster issues are openflow or 
> cluster related, all bugs are now in openflow queue and I would expect 
> openflow devs to move to cluster queue if that is where they belong to.
> 
> BR/Luis
> 
> 
>> On Feb 7, 2017, at 10:34 AM, Luis Gomez <ece...@gmail.com 
>> <mailto:ece...@gmail.com>> wrote:
>> 
>> 
>>> On Feb 7, 2017, at 10:03 AM, Vratko Polak -X (vrpolak - PANTHEON 
>>> TECHNOLOGIES at Cisco) <vrpo...@cisco.com <mailto:vrpo...@cisco.com>> wrote:
>>> 
>>> Two more questions.
>>> 
>>> > https://jenkins.opendaylight.org/releng/view/CSIT-3node/job/openflowplugin-csit-3node-clustering-only-boron/
>>> >  
>>> > <https://jenkins.opendaylight.org/releng/view/CSIT-3node/job/openflowplugin-csit-3node-clustering-only-boron/>
>>> > Cluster non HA test
>>>  
>>> I just realized 1) and 2) are the same job.
>>> I am not sure which of the six suites [1]
>>> are you referring to.
>> 
>> Typo, this is the link for non-HA: 
>> https://jenkins.opendaylight.org/releng/view/openflowplugin/job/openflowplugin-csit-3node-periodic-bulkomatic-clustering-daily-only-boron/
>>  
>> <https://jenkins.opendaylight.org/releng/view/openflowplugin/job/openflowplugin-csit-3node-periodic-bulkomatic-clustering-daily-only-boron/>
>>>  
>>> >> but other tests are not, I will have to investigate this.
>>> > 
>>> > Keep us informed.
>>>  
>>> Do you have an ETA?
>> 
>> I would say in the next 2 weeks I will have something in place for cluster 
>> scalability.
>> 
>>>  
>>> Vratko.
>>>  
>>> [1] 
>>> https://logs.opendaylight.org/releng/jenkins092/openflowplugin-csit-3node-clustering-only-carbon/470/archives/log.html.gz
>>>  
>>> <https://logs.opendaylight.org/releng/jenkins092/openflowplugin-csit-3node-clustering-only-carbon/470/archives/log.html.gz>
>>>  
>>> From: Vratko Polak -X (vrpolak - PANTHEON TECHNOLOGIES at Cisco) 
>>> Sent: 7 February, 2017 15:05
>>> To: 'Luis Gomez' <ece...@gmail.com <mailto:ece...@gmail.com>>
>>> Cc: integration-...@lists.opendaylight.org 
>>> <mailto:integration-...@lists.opendaylight.org>; 
>>> controller-dev@lists.opendaylight.org 
>>> <mailto:controller-dev@lists.opendaylight.org>; openflowplugin-dev 
>>> <openflowplugin-...@lists.opendaylight.org 
>>> <mailto:openflowplugin-...@lists.opendaylight.org>>
>>> Subject: RE: [integration-dev] Clustering acceptance tests
>>>  
>>> Thanks Luis.
>>>  
>>> > but other tests are not, I will have to investigate this.
>>>  
>>> Keep us informed.
>>>  
>>> > 3) & 4) is probably controller cluster limitation.
>>>  
>>> Both jobs occasionally pass,
>>> and I have opened a Bug [0] for exceptions in karaf log.
>>> To me, it looks like an error in OpenflowPlugin
>>> (as opposed to Controller) code.
>>>  
>>> > writing very fast (REST or internal app) on a shard follower DS, and 
>>> > reading on the other follower.
>>>  
>>> We plan to expand controller-csit-3node-rest-clust-cars-perf-only-carbon,
>>> not sure yet whether this scenario will be included.
>>>  
>>> Vratko.
>>>  
>>> [0] https://bugs.opendaylight.org/show_bug.cgi?id=7750 
>>> <https://bugs.opendaylight.org/show_bug.cgi?id=7750>
>>>  
>>> From: Luis Gomez [mailto:ece...@gmail.com <mailto:ece...@gmail.com>] 
>>> Sent: 7 February, 2017 08:35
>>> To: Vratko Polak -X (vrpolak - PANTHEON TECHNOLOGIES at Cisco) 
>>> <vrpo...@cisco.com <mailto:vrpo...@cisco.com>>
>>> Cc: integration-...@lists.opendaylight.org 
>>> <mailto:integration-...@lists.opendaylight.org>; 
>>> controller-dev@lists.opendaylight.org 
>>> <mailto:controller-dev@lists.opendaylight.org>; openflowplugin-dev 
>>> <openflowplugin-...@lists.opendaylight.org 
>>> <mailto:openflowplugin-...@lists.opendaylight.org>>
>>> Subject: Re: [integration-dev] Clustering acceptance tests
>>>  
>>> Here is what I know from OpenFlow plugin (cc-ing ofplugin devs):
>>>  
>>> * Does your project have a test plan mentioning specific cluster scenarios?
>>>  
>>> Not written test plan but we are running a bunch of cluster tests.
>>>  
>>> 
>>> * Do you have any of such scenarios implemented as Robot suites?
>>>  
>>> 1) 
>>> https://jenkins.opendaylight.org/releng/view/CSIT-3node/job/openflowplugin-csit-3node-clustering-only-boron/
>>>  
>>> <https://jenkins.opendaylight.org/releng/view/CSIT-3node/job/openflowplugin-csit-3node-clustering-only-boron/>
>>>  ->  Cluster HA test (DPN connect to all nodes), it used to pass except for 
>>> 1 test (member isolation with iptables), now I see this test is stable but 
>>> other tests are not, I will have to investigate this.
>>>  
>>> 2) 
>>> https://jenkins.opendaylight.org/releng/view/CSIT-3node/job/openflowplugin-csit-3node-clustering-only-boron/
>>>  
>>> <https://jenkins.opendaylight.org/releng/view/CSIT-3node/job/openflowplugin-csit-3node-clustering-only-boron/>
>>>  -> Cluster non HA test (DPN connect to 1 node), failing because this old 
>>> bug: https://bugs.opendaylight.org/show_bug.cgi?id=6459 
>>> <https://bugs.opendaylight.org/show_bug.cgi?id=6459>.
>>>  
>>> 3) 
>>> https://jenkins.opendaylight.org/releng/view/CSIT-3node/job/openflowplugin-csit-3node-periodic-bulkomatic-clustering-perf-daily-only-boron/
>>>  
>>> <https://jenkins.opendaylight.org/releng/view/CSIT-3node/job/openflowplugin-csit-3node-periodic-bulkomatic-clustering-perf-daily-only-boron/>
>>>  -> Max flows/sec using bulk-o-matic DS on cluster setup. Not fully working 
>>> because some cluster backend limitation 
>>> https://bugs.opendaylight.org/show_bug.cgi?id=6755 
>>> <https://bugs.opendaylight.org/show_bug.cgi?id=6755>
>>>  
>>> 4) 
>>> https://jenkins.opendaylight.org/releng/view/CSIT-3node/job/openflowplugin-csit-3node-periodic-restconf-clustering-perf-daily-only-boron/
>>>  
>>> <https://jenkins.opendaylight.org/releng/view/CSIT-3node/job/openflowplugin-csit-3node-periodic-restconf-clustering-perf-daily-only-boron/>
>>>  -> Max flows/sec using NB REST on cluster setup, this never worked very 
>>> good because previous bug.
>>>  
>>> * Do the robot suites have failures, suspected to be caused by clustering
>>>   (as opposed to application logic, or mistakes in Robot code)?
>>>  
>>> So far I think issue in 2) is OpenFlow cluster implementation and issue in 
>>> 3) & 4) is probably controller cluster limitation.
>>>  
>>> 
>>> * Are there open Bugs corresponding to the clustering failures?
>>>  
>>> Yes, except for 1) that will require some analysis on the unstable tests.
>>>  
>>> 
>>> * Are you planning to implement more Robot 3node suites until Carbon 
>>> release?
>>>  
>>> I will probably replace 1 of the performance suites (no point to run 2 if 
>>> they do not work) by a cluster switch scalability test. 
>>>  
>>> 
>>> * Are there scenarios you would like Controller team to cover using mock 
>>> apps?
>>>  
>>> I think issue in 3) & 4) could be reproduced in controller project by just 
>>> writing very fast (REST or internal app) on a shard follower DS, and 
>>> reading on the other follower. 
>>>  
>>> On Feb 6, 2017, at 5:31 AM, Vratko Polak -X (vrpolak - PANTHEON 
>>> TECHNOLOGIES at Cisco) <vrpo...@cisco.com <mailto:vrpo...@cisco.com>> wrote:
>>>  
>>> Hello Test Contacts.
>>>  
>>> In Controller project, our highest priority
>>> for Carbon release is to make sure ODL clustering
>>> is usable and stable.
>>>  
>>> We are in the phase of formulating explicit acceptance criteria,
>>> so we can create execution plan for turning them into Robot suites.
>>>  
>>> Of course, clustering is not very useful just by itself,
>>> it is used as a tool applications can use to reach their goals.
>>> So real acceptance criteria for clustering should also
>>> take into account whether ODL applications can work in cluster.
>>>  
>>> Many projects are already running their 3node CSIT tests,
>>> but on one hand, some important scenarios might be not covered yet,
>>> and some suites might be too unstable to serve as acceptance tests.
>>>  
>>> Controller team is small and busy, so we are asking for help.
>>> Here is a set of quick questions for test contacts:
>>> * Does your project have a test plan mentioning specific cluster scenarios?
>>> * Do you have any of such scenarios implemented as Robot suites?
>>> * Do the robot suites have failures, suspected to be caused by clustering
>>>   (as opposed to application logic, or mistakes in Robot code)?
>>> * Are there open Bugs corresponding to the clustering failures?
>>> * Are you planning to implement more Robot 3node suites until Carbon 
>>> release?
>>> * Are there scenarios you would like Controller team to cover using mock 
>>> apps?
>>>  
>>> Vratko (as a Controller test contact).
>>> _______________________________________________
>>> integration-dev mailing list
>>> integration-...@lists.opendaylight.org 
>>> <mailto:integration-...@lists.opendaylight.org>
>>> https://lists.opendaylight.org/mailman/listinfo/integration-dev 
>>> <https://lists.opendaylight.org/mailman/listinfo/integration-dev>
> 

_______________________________________________
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev

Reply via email to