Hi All,
Global state of the MB cluster becomes inconsistent, when the network
becomes partitioned (split brains) in previous MB version(s). So as a
solution we propose following,
1) a MB cluster cannot go below a defined number ( a.k.a: minimum cluster
size)
2) During a network partition if node count (/size) of the particular
partition is less than 'minimum cluster size' then that partition(s)
2.1) will stop accepting incoming traffic/connections
2.2) disconnect all active connections (
publishers/subscribers)
So idea is to let only a single partition ( which has the cluster size >=
minimum cluster size) keep working while other(s) stop working.
Therefore, choosing the number 'minimum cluster size' is important when
deploying MB.
otherwise user will have multiple network partitions ( where size >=
minimum cluster size) working in parallel creating the problem we are
trying to solve here.
So here's the way to pick the number:
| Cluster size | Minimum Node Count |
|-------------------|--------------------|
| 2 | 2 |
| 3 | 2 |
| 4 | 3 |
| 5 | 3 |
| N | (N / 2) + 1 |
So this will have a direct effect on minimum HAed deployment for MB which
used to 2.
why?
suppose, users now deploy 2 node MB cluster with this feature enabled. then
during a network partition both nodes will stop working. this may be fine
since it will make MB cluster reliable but in users point of view its a
complete outage (since none of the nodes except traffic).
Therefore now minimum HAed node count for MB become 3.
When cluster size is 3, it will be able to withstand 1 node being in a
network partition (and other 2 nodes will work).
thoughts?
Jira: https://wso2.org/jira/browse/MB-1664
--
Ramith Jayasinghe
Technical Lead
WSO2 Inc., http://wso2.com
lean.enterprise.middleware
E: [email protected]
P: +94 772534930
_______________________________________________
Architecture mailing list
[email protected]
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture