Re: [Architecture] RDBMS based coordinator election algorithm for MB

Imesh Gunaratne Thu, 04 Aug 2016 06:54:54 -0700

Hi Asanka,

Do we really need to implement a leader election algorithm on our own?
AFAIU this is a complex problem which has been already solved by several
algorithms [1]. IMO it would be better to go ahead with an existing well
established implementation on etcd [1] or Consul [2].


Those provide HTTP APIs for clients to make leader election calls. [3] is a
client library written in Node.js for etcd based leader election.

[1] https://www.projectcalico.org/using-etcd-for-elections
[2] https://www.consul.io/docs/guides/leader-election.html
[3] https://www.npmjs.com/package/etcd-leader

Thanks

On Wed, Aug 3, 2016 at 5:12 PM, Asanka Abeyweera <[email protected]> wrote:

> Hi Maninda,
>
> Since we are using RDBMS to poll the node status, the cluster will not end
> up in situation 1,2 or 3. With this approach we consider a node unreachable
> when it cannot access the database. Therefore an unreachable node can never
> be the leader.
>
> As you have mentioned, we are currently using the RDBMS as an atomic
> global variable to create the coordinator entry.
>
> On Tue, Aug 2, 2016 at 5:22 PM, Maninda Edirisooriya <[email protected]>
> wrote:
>
>> Hi Asanka,
>>
>> As I understand the accuracy of electing the leader correctly is
>> dependent on the election mechanism with RDBMS because there can be edge
>> cases like,
>>
>> 1. Unreachable leader activates during the election process: Then who
>> becomes the leader?
>> 2. The elected leader becomes unreachable before the election is
>> completed: Then will there be a situation where there is no leader?
>> 3. A leader and a set of nodes are disconnected from the other part of
>> the cluster and while the leader is trying to remove unreachable members
>> other part is calling an election to make a leader: Who will win?
>>
>> RDBMS based election algorithm should handle such cases without bringing
>> the cluster to an inconsistent state or dead lock in all concurrent cases.
>> If all these kind of cases cannot be handled isn't it better to keep the
>> current hazelcast clustering and use the RDBMS only to handle the split
>> brain scenario? In other words when a new hazelcast leader is elected it
>> should be updated in the RDBMS. If another split party has already elected
>> a leader, the node who is going to write it to RDBMS should avoid updating
>> it. Simply, the RDBMS can be used as an atomic global variable to keep the
>> leader name by modifying the hazelcast clustering. WDYT?
>>
>> Thanks.
>>
>>
>> *Maninda Edirisooriya*
>> Senior Software Engineer
>>
>> *WSO2, Inc.*lean.enterprise.middleware.
>>
>> *Blog* : http://maninda.blogspot.com/
>> *E-mail* : [email protected]
>> *Skype* : @manindae
>> *Twitter* : @maninda
>>
>> On Thu, Jul 28, 2016 at 4:38 PM, Asanka Abeyweera <[email protected]>
>> wrote:
>>
>>> Hi Akila,
>>>
>>> Let me explain the issue in a different way. Let's assume the MB nodes
>>> are using two different network interfaces for Hazelcast communication and
>>> database communication. With such a configuration, there can be failures
>>> only in the network interface used for Hazelcast communication in some
>>> nodes. When this happens, there will be two or more Hazelcast clusters due
>>> to the network segmentation, and as a result there will be multiple
>>> coordinators. Since every node still have access to the database, multiple
>>> coordinators can affect the correctness of the data stored in the DB. But
>>> if we used a RDBMS based approach we won't have multiple coordinators due
>>> to a network partition in Hazelcast. This is one advantage we get from this
>>> approach.
>>>
>>> Even when we use Zookeeper or RAFT the same issue will be there since we
>>> are using different interfaces for Hazelcast communication and DB
>>> communication.
>>>
>>>
>>> On Thu, Jul 28, 2016 at 2:56 PM, Akila Ravihansa Perera <
>>> [email protected]> wrote:
>>>
>>>> Hi,
>>>>
>>>> What's the advantage of using RDBMS (even as an alternative) to
>>>> implement a leader/coordinator election? If the network connection to DB
>>>> fails then this will be a single point of failure. I don't think we can
>>>> scale RDBMS instances and expect the election algorithm to work. That would
>>>> be reducing this problem to another problem (electing coordinator RDBMS
>>>> instance).
>>>>
>>>> IMHO it would be better to look at Zookeeper Atomic Broadcast (ZAB) [1]
>>>> or RAFT leader election [2] algorithms which have already proven results.
>>>>
>>>> [1] https://cwiki.apache.org/confluence/display/ZOOKEEPER/Zab1.0
>>>> [2] http://libraft.io/
>>>>
>>>> Thanks.
>>>>
>>>> On Thu, Jul 28, 2016 at 1:42 PM, Nandika Jayawardana <[email protected]>
>>>> wrote:
>>>>
>>>>> +1 to make it a common component . We have the clustering
>>>>> implementation for BPEL component based on hazelcast.  If the coordination
>>>>> is available at RDBMS level, we can remove hazelcast dependancy.
>>>>>
>>>>> Regards
>>>>> Nandika
>>>>>
>>>>> On Thu, Jul 28, 2016 at 1:28 PM, Hasitha Aravinda <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Can we make it a common component, which is not hard coupled with MB.
>>>>>> BPS has the same requirement.
>>>>>>
>>>>>> Thanks,
>>>>>> Hasitha.
>>>>>>
>>>>>> On Thu, Jul 28, 2016 at 9:47 AM, Asanka Abeyweera <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi All,
>>>>>>>
>>>>>>> In MB, we have used a coordinator based approach to manage
>>>>>>> distributed messaging algorithm in the cluster. Currently Hazelcast is 
>>>>>>> used
>>>>>>> to elect the coordinator. But one issue we faced with Hazelcast is, 
>>>>>>> during
>>>>>>> a network segmentation (split brain), Hazelcast can elect two or more
>>>>>>> coordinators in the cluster. This affects the correctness of the
>>>>>>> distributed messaging algorithm since there are some tables in the 
>>>>>>> database
>>>>>>> that should only be edited by a single node (i.e. coordinator).
>>>>>>>
>>>>>>> As a solution to this problem we have implemented minimum node count
>>>>>>> based approach [1] to deactivate set of partitioned nodes to stop 
>>>>>>> multiple
>>>>>>> nodes becoming coordinators until the network segmentation issue is 
>>>>>>> fixed.
>>>>>>>
>>>>>>> As an alternative solution, we are thinking of implementing an RDBMS
>>>>>>> based approach to elect the coordinator node in the cluster. By doing 
>>>>>>> this
>>>>>>> we can make sure that even during a network segmentation only one node 
>>>>>>> will
>>>>>>> be elected as the coordinator node since the election is happening 
>>>>>>> through
>>>>>>> the database.
>>>>>>>
>>>>>>> The algorithm will use a polling mechanism to check the validity of
>>>>>>> the nodes. To make the election algorithm scalable, only the coordinator
>>>>>>> node will be checking status of all the nodes in the cluster and it will
>>>>>>> inform other nodes through database when a member is added/left. The 
>>>>>>> nodes
>>>>>>> will be only checking for the status of the coordinator node. When a 
>>>>>>> node
>>>>>>> detect that coordinator is invalid it will go for a election to elect a 
>>>>>>> new
>>>>>>> coordinator.
>>>>>>>
>>>>>>> We are currently working on a POC to test how this works with MB's
>>>>>>> slot based messaging algorithm.
>>>>>>>
>>>>>>> thoughts?
>>>>>>>
>>>>>>> [1] https://wso2.org/jira/browse/MB-1664
>>>>>>>
>>>>>>> --
>>>>>>> Asanka Abeyweera
>>>>>>> Senior Software Engineer
>>>>>>> WSO2 Inc.
>>>>>>>
>>>>>>> Phone: +94 712228648
>>>>>>> Blog: a5anka.github.io
>>>>>>>
>>>>>>> <https://wso2.com/signature>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Architecture mailing list
>>>>>>> [email protected]
>>>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> --
>>>>>> Hasitha Aravinda,
>>>>>> Associate Technical Lead,
>>>>>> WSO2 Inc.
>>>>>> Email: [email protected]
>>>>>> Mobile : +94 718 210 200
>>>>>>
>>>>>> _______________________________________________
>>>>>> Architecture mailing list
>>>>>> [email protected]
>>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Nandika Jayawardana
>>>>> WSO2 Inc ; http://wso2.com
>>>>> lean.enterprise.middleware
>>>>>
>>>>> _______________________________________________
>>>>> Architecture mailing list
>>>>> [email protected]
>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Akila Ravihansa Perera
>>>> WSO2 Inc.;  http://wso2.com/
>>>>
>>>> Blog: http://ravihansa3000.blogspot.com
>>>>
>>>> _______________________________________________
>>>> Architecture mailing list
>>>> [email protected]
>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>>
>>>>
>>>
>>>
>>> --
>>> Asanka Abeyweera
>>> Senior Software Engineer
>>> WSO2 Inc.
>>>
>>> Phone: +94 712228648
>>> Blog: a5anka.github.io
>>>
>>> <https://wso2.com/signature>
>>>
>>> _______________________________________________
>>> Architecture mailing list
>>> [email protected]
>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>
>>>
>>
>
>
> --
> Asanka Abeyweera
> Senior Software Engineer
> WSO2 Inc.
>
> Phone: +94 712228648
> Blog: a5anka.github.io
>
> <https://wso2.com/signature>
>
> _______________________________________________
> Architecture mailing list
> [email protected]
> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>
>


-- 
*Imesh Gunaratne*
Software Architect
WSO2 Inc: http://wso2.com
T: +94 11 214 5345 M: +94 77 374 2057
W: https://medium.com/@imesh TW: @imesh
lean. enterprise. middleware

_______________________________________________
Architecture mailing list
[email protected]
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture

Re: [Architecture] RDBMS based coordinator election algorithm for MB

Reply via email to