Robert, thank you so much for your insightful answers, I'm wondering if we can 
have a meeting to discuss this specially, last week, we discussed this in 
openflowplugin weekly meeting, I believe you participated in ODL DDF and joined 
discussion about :ODL scale" Luis presented, it will be great let us have a 
meeting to focus on this.

Abhijit, Anil, does my suggestion make sense? Robert is the strongest arguer 
for akka-based ODL cluster.

Robert, but in ODL cluster, there is only one leader node no matter how many 
nodes we have, I think it is bottleneck, isn't it? In my mind, message queue is 
the only feasible way to synchronize data in larger scale distributed 
application, I'm not sure if akka is using the same way to handle data 
synchronization. I would like to get your idea about this. I know akka uses 
gossip, but leader node will be responsible for synchronizing data to all the 
other follower nodes, this is a big issue, in message queue solution, message 
servers can handle this workload, data producer just send data once, in current 
ODL cluster, I think, the leader node will send N-1 times data to all the other 
follower nodes, please correct me if I'm wrong.

-----邮件原件-----
发件人: Robert Varga [mailto:n...@hq.sk] 
发送时间: 2019年6月5日 3:20
收件人: Yi Yang (杨燚)-云服务集团 <yangy...@inspur.com>; vishnoia...@gmail.com
抄送: avish...@luminanetworks.com; openflowplugin-...@lists.opendaylight.org; 
robert.va...@pantheon.tech; mdsal-...@lists.opendaylight.org; 
abhijit.kumbh...@ericsson.com; d...@lists.opendaylight.org; 
controller-dev@lists.opendaylight.org
主题: Re: 答复: [controller-dev] 答复: Is Read from follower shard ok and 
openflowplugin master must be shard leader?

On 04/06/2019 02:29, Yi Yang (杨燚)-云服务集团 wrote:
> Robert, we're talking about scalability, can you tell us how many nodes 
> current akka-base clustering can support at most?

Yi,

I think we have vocabulary (i.e. language) discrepancy. In order to be
clear:

- "performance" means how fast a system is when operating with a certain 
working set

- "scalability" means how well a system is able to maintain performance when 
the working set is increased. I think you may have meant this when you asked 
about IMDT "efficiency", but I can't be sure.

In a potentially-distributed system, there are two distinct parts which affect 
how the system can scale:

- "vertical scalability" means how well the system can be scaled by increasing 
resources available to individual nodes

- "horizontal scalability" means how well the system can be scaled by 
increasing the number of individual nodes

I think it is always more efficient use of resources to allocated them to 
scaling vertically rather than horizontally -- each node participating in a 
distributed system typically requires non-zero overhead.

The number of potential nodes is limited by what Akka can provide us with -- 
which I see no problem with based on 
https://www.lightbend.com/blog/running-a-2400-akka-nodes-cluster-on-google-compute-engine.

> Per my understanding, current ODL clustering is more like a disaster backup 
> solution for data store, I don't think it can work correctly if we have 128 
> nodes there.

I am not sure what that understanding is based on. CDS uses an implementation 
of RAFT, which does not place artificial limits on the number of participating 
nodes.

I do not see any design issue with deploying CDS on such a large number of 
nodes. There may be bugs, but those are just bugs -- I do believe it
*will* work correctly.

> In cloud environment, tenants are dynamically creating and destroying VMs, 
> which will install and remove flows very often, openflow statistics is also a 
> not-small stress for openflow. Per current openflowplugin clustering, one ovs 
> node is connected to 3 odl nodes, these are permanent tcp connections, hoe 
> many ovs nodes can 3 odl nodes support at most? Anybody tested it, I think it 
> won't surpass 100.

That largely depends on what flows are loaded on the switches.

Yes, somebody tested it, and yes, it did surpass 100, thank you:
https://slides.com/dfarrell07/odl-perf/fullscreen#/1

> As I said, config inventory will have 2MB data in a 3 nodes environment, you 
> can evaluate how much data is there if we have 10000 nodes, do you think 
> current ODL replication mechanism can work well?

As I wrote previously, this heavily depends on the structure of the data, what 
the application does and how. It also depends on the software being used.

To get definitive answers, I do suggest running some tests and evaluating them.

> I know Pantheon has some commercial deployment in production environments, 
> can you tell us how many devices/nodes you can support at most in a 3 node 
> ODL cluster?

Not really, sorry.

Even if I could, the numbers depend on the particulars of a deployment and I 
have precious little details about what is it exactly you are doing and how -- 
and thus could not select the relevant data to share.

> Performance and scalability are two things, we always can get performance 
> improvement less or more by optimizing, but scalability is not so, we have to 
> redesign something to get scalability, any ODL developer ever considered how 
> ODL supports 10000 nodes cloud environment? You are MDSAL experts, it will be 
> great if you can show us your insights about this here.

Yes, we have considered both horizontal and vertical scalability when we 
architected the MD-SAL and yes, 10K (and more) nodes were considered and taken 
into account.

Horizontal scalability usually means partitioning the working set in one way or 
another -- how exactly you do that is part of the solution design.

Any decisions made here must be done with regard of the actual application 
being run on the system, as data layout has strong interplay with data access 
patterns.

Regards,
Robert

Attachment: smime.p7s
Description: S/MIME cryptographic signature

_______________________________________________
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev

Reply via email to