[controller-dev] 答复: 答复: Is Read from follower shard ok and openflowplugin master must be shard leader?

杨燚 Mon, 03 Jun 2019 17:29:41 -0700

Robert, we're talking about scalability, can you tell us how many nodes current 
akka-base clustering can support at most? Per my understanding, current ODL 
clustering is more like a disaster backup solution for data store, I don't 
think it can work correctly if we have 128 nodes there.

In cloud environment, tenants are dynamically creating and destroying VMs, 
which will install and remove flows very often, openflow statistics is also a 
not-small stress for openflow. Per current openflowplugin clustering, one ovs 
node is connected to 3 odl nodes, these are permanent tcp connections, hoe many 
ovs nodes can 3 odl nodes support at most? Anybody tested it, I think it won't 
surpass 100.

As I said, config inventory will have 2MB data in a 3 nodes environment, you 
can evaluate how much data is there if we have 10000 nodes, do you think 
current ODL replication mechanism can work well?

I know Pantheon has some commercial deployment in production environments, can 
you tell us how many devices/nodes you can support at most in a 3 node ODL 
cluster?

Performance and scalability are two things, we always can get performance 
improvement less or more by optimizing, but scalability is not so, we have to 
redesign something to get scalability, any ODL developer ever considered how 
ODL supports 10000 nodes cloud environment? You are MDSAL experts, it will be 
great if you can show us your insights about this here.

-----邮件原件-----
发件人: Robert Varga [mailto:n...@hq.sk] 
发送时间: 2019年6月4日 7:35
收件人: Anil Vishnoi <vishnoia...@gmail.com>; Yi Yang (杨燚)-云服务集团 
<yangy...@inspur.com>
抄送: avish...@luminanetworks.com; openflowplugin-...@lists.opendaylight.org; 
robert.va...@pantheon.tech; mdsal-...@lists.opendaylight.org; 
abhijit.kumbh...@ericsson.com; d...@lists.opendaylight.org; 
controller-dev@lists.opendaylight.org
主题: Re: [controller-dev] 答复: Is Read from follower shard ok and openflowplugin 
master must be shard leader?

On 31/05/2019 20:39, Anil Vishnoi wrote:
> Hi Yi,
> 
> Please see inline...
> 
> On Thu, May 30, 2019 at 5:04 PM Yi Yang (杨燚)-云服务集团
> <yangy...@inspur.com <mailto:yangy...@inspur.com>> wrote:

[trim]

>     # Q2. Openflowplugin clustering also has master, per its document,
>     only openflowplugin master node can do write operation against
>     inventory data store, then what if this openflowplugin master node
>     is follower shard?
> 
> OpenFlow plugin is driven by the devices connected to it, in the 
> clustered setup. OpenFlow plugin allows you to connect your device to 
> any of the controller node (one or more), and internally it will 
> decide which node from the cluster will be the owner/master of the 
> device using Cluster SIngleton Service + EOS. Once the owner/master is 
> decided, that owner/master is the one allowed to write data to the 
> "operational"
> inventory (plugin don't write to config inventory).

Note that there is an improvement possible to the latency of master/backup 
selection here. It was presented by HPE way back a long time ago, but it never 
made its way upstream.

[snip]

>     # Q4. Anybody can recommend node number of a ODL cluster which will
>     manage 10000 compute/network nodes? I think leader nodes will have
>     too high workload if number of ODL cluster node is too big so that
>     it can’t do horizontal scale, per current default shard strategy,
>     every node has all the data store, that looks more like data store
>     replication, not distribute data store on all the nodes.
> 
> In my experience and opinion, ODL in clustered setup is not a solution 
> here. As i mentioned above, with cluster setup i can think of two 
> possible solution as i mentioned above. Deploying 20 cluster will be 
> operational nightmare (E.g per cluster partition issues, device 
> switching between cluster, device inventory data sharing across 
> cluster on device switching etc). Apart from that you will need 
> external mechanism to share the data between these clusters. And 
> depends on your application, things can get even more complicated to 
> maintain in production environment. If you go with the second option 
> of 60 nodes in cluster, i am not even sure this cluster even will boot 
> up properly :), let alone managing the devices. To make it work, you 
> need to go with the prefix-based-sharding and cook a solution per 
> device (per deivce shard, nodes where this shard can be replicated, 
> making sure that device connection only switch to the node where the 
> devie shard is replicated etc etc etc).

I think we need to drill down into assumptions and design a bit more:
- what is the mode of operation of OFP here? Does it include FRM?
- how many flows are expected to be programmed on each device?
- what is the expected stability of those flows?

Can you share details about your experience (ODL version, setup, numbers)?

At the end of the day, there is precious little hard data going around, as the 
Performance Report is an orphan ever since Marcus left.
Refreshing that report is probably the low-hanging fruit here: Yi, are you 
willing to organize and drive that effort?

>     # Q5. Is it possible to run an asymmetric ODL cluster? I mean some
>     nodes are full stack (there are netvirt, sfc, genius, etc), some
>     nodes are southbound only (only install openflowplugin, ovsdb). I
>     don’t think we must run other stuff in south bound device management
>     nodes except southbound protocols.
> 
> I think you can do that, but if you want HA for your application and 
> southbound plugins and also you want to run these in exlusion, 3 node 
> cluster is not going to work (atleast you need 4 nodes in cluster).

The number of total cluster nodes needs to be always odd, unless we are talking 
externally managed active/passive deployment in which case in needs to be 2 * n 
when n is an odd number larger than 2.

4 nodes is never a valid option. 3, 5, 6 (a/p), 7, 9, 10 (a/p), ... are valid 
options.

>     #Q7. Anybody can propose a good ODL clustering solution for a super
>     scale data center which has 10000 nodes?
> 
> In my experience, if you are looking for stable production environment 
> with low operation cost (logistic, resource, support etc), ODL in 
> "clustering" environment is probably not at-par solution. Luis and 
> myself, shared some high level thoughts on how we can achieve this 
> kind of scale using horizontally scalable system in the ONS summit. 
> Here is the deck if you want to get more details.
> https://docs.google.com/presentation/d/1gDLHyyuh8VVRpzHpTq9GDkv4XKAe3E
> aSbm2uGJFTiO8/edit?usp=sharing

I do not want to sound dismissive, but those slides are general guidelines 
without any validation -- there is no apples-to-apples comparison between two 
implementations of the same use case, one implemented as-is and one with the 
guidelines.

Regards,
Robert

smime.p7s
Description: S/MIME cryptographic signature

_______________________________________________
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev

[controller-dev] 答复: 答复: Is Read from follower shard ok and openflowplugin master must be shard leader?

Reply via email to