Hi All,

In MB, we have used a coordinator based approach to manage distributed
messaging algorithm in the cluster. Currently Hazelcast is used to elect
the coordinator. But one issue we faced with Hazelcast is, during a network
segmentation (split brain), Hazelcast can elect two or more coordinators in
the cluster. This affects the correctness of the distributed messaging
algorithm since there are some tables in the database that should only be
edited by a single node (i.e. coordinator).

As a solution to this problem we have implemented minimum node count based
approach [1] to deactivate set of partitioned nodes to stop multiple nodes
becoming coordinators until the network segmentation issue is fixed.

As an alternative solution, we are thinking of implementing an RDBMS based
approach to elect the coordinator node in the cluster. By doing this we can
make sure that even during a network segmentation only one node will be
elected as the coordinator node since the election is happening through the
database.

The algorithm will use a polling mechanism to check the validity of the
nodes. To make the election algorithm scalable, only the coordinator node
will be checking status of all the nodes in the cluster and it will inform
other nodes through database when a member is added/left. The nodes will be
only checking for the status of the coordinator node. When a node detect
that coordinator is invalid it will go for a election to elect a new
coordinator.

We are currently working on a POC to test how this works with MB's slot
based messaging algorithm.

thoughts?

[1] https://wso2.org/jira/browse/MB-1664

-- 
Asanka Abeyweera
Senior Software Engineer
WSO2 Inc.

Phone: +94 712228648
Blog: a5anka.github.io

<https://wso2.com/signature>
_______________________________________________
Architecture mailing list
[email protected]
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture

Reply via email to