Re: [Architecture] RDBMS based coordinator election algorithm for MB

Asanka Abeyweera Thu, 04 Aug 2016 06:58:34 -0700

Hi Imesh,

We are not implementing this to overcome a limitation in the coordination
algorithm available in the Hazlecast. We are implementing this since we
need an RDBMS based coordination algorithm (not a network based algorithm).
The reason is, a network based election algorithm will always elect
multiple leaders when the network is partitioned. But if we use a RDBMS
based algorithm this will not happen. We could not find a opensource
library that implement a RDBMS based coordination algorithm. That is why we
started writing our own one.




On Thu, Aug 4, 2016 at 7:16 PM, Imesh Gunaratne <[email protected]> wrote:

> Hi Asanka,
>
> Do we really need to implement a leader election algorithm on our own?
> AFAIU this is a complex problem which has been already solved by several
> algorithms [1]. IMO it would be better to go ahead with an existing well
> established implementation on etcd [1] or Consul [2].
>
> Those provide HTTP APIs for clients to make leader election calls. [3] is
> a client library written in Node.js for etcd based leader election.
>
> [1] https://www.projectcalico.org/using-etcd-for-elections
> [2] https://www.consul.io/docs/guides/leader-election.html
> [3] https://www.npmjs.com/package/etcd-leader
>
> Thanks
>
> On Wed, Aug 3, 2016 at 5:12 PM, Asanka Abeyweera <[email protected]>
> wrote:
>
>> Hi Maninda,
>>
>> Since we are using RDBMS to poll the node status, the cluster will not
>> end up in situation 1,2 or 3. With this approach we consider a node
>> unreachable when it cannot access the database. Therefore an unreachable
>> node can never be the leader.
>>
>> As you have mentioned, we are currently using the RDBMS as an atomic
>> global variable to create the coordinator entry.
>>
>> On Tue, Aug 2, 2016 at 5:22 PM, Maninda Edirisooriya <[email protected]>
>> wrote:
>>
>>> Hi Asanka,
>>>
>>> As I understand the accuracy of electing the leader correctly is
>>> dependent on the election mechanism with RDBMS because there can be edge
>>> cases like,
>>>
>>> 1. Unreachable leader activates during the election process: Then who
>>> becomes the leader?
>>> 2. The elected leader becomes unreachable before the election is
>>> completed: Then will there be a situation where there is no leader?
>>> 3. A leader and a set of nodes are disconnected from the other part of
>>> the cluster and while the leader is trying to remove unreachable members
>>> other part is calling an election to make a leader: Who will win?
>>>
>>> RDBMS based election algorithm should handle such cases without bringing
>>> the cluster to an inconsistent state or dead lock in all concurrent cases.
>>> If all these kind of cases cannot be handled isn't it better to keep the
>>> current hazelcast clustering and use the RDBMS only to handle the split
>>> brain scenario? In other words when a new hazelcast leader is elected it
>>> should be updated in the RDBMS. If another split party has already elected
>>> a leader, the node who is going to write it to RDBMS should avoid updating
>>> it. Simply, the RDBMS can be used as an atomic global variable to keep the
>>> leader name by modifying the hazelcast clustering. WDYT?
>>>
>>> Thanks.
>>>
>>>
>>> *Maninda Edirisooriya*
>>> Senior Software Engineer
>>>
>>> *WSO2, Inc.*lean.enterprise.middleware.
>>>
>>> *Blog* : http://maninda.blogspot.com/
>>> *E-mail* : [email protected]
>>> *Skype* : @manindae
>>> *Twitter* : @maninda
>>>
>>> On Thu, Jul 28, 2016 at 4:38 PM, Asanka Abeyweera <[email protected]>
>>> wrote:
>>>
>>>> Hi Akila,
>>>>
>>>> Let me explain the issue in a different way. Let's assume the MB nodes
>>>> are using two different network interfaces for Hazelcast communication and
>>>> database communication. With such a configuration, there can be failures
>>>> only in the network interface used for Hazelcast communication in some
>>>> nodes. When this happens, there will be two or more Hazelcast clusters due
>>>> to the network segmentation, and as a result there will be multiple
>>>> coordinators. Since every node still have access to the database, multiple
>>>> coordinators can affect the correctness of the data stored in the DB. But
>>>> if we used a RDBMS based approach we won't have multiple coordinators due
>>>> to a network partition in Hazelcast. This is one advantage we get from this
>>>> approach.
>>>>
>>>> Even when we use Zookeeper or RAFT the same issue will be there since
>>>> we are using different interfaces for Hazelcast communication and DB
>>>> communication.
>>>>
>>>>
>>>> On Thu, Jul 28, 2016 at 2:56 PM, Akila Ravihansa Perera <
>>>> [email protected]> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> What's the advantage of using RDBMS (even as an alternative) to
>>>>> implement a leader/coordinator election? If the network connection to DB
>>>>> fails then this will be a single point of failure. I don't think we can
>>>>> scale RDBMS instances and expect the election algorithm to work. That 
>>>>> would
>>>>> be reducing this problem to another problem (electing coordinator RDBMS
>>>>> instance).
>>>>>
>>>>> IMHO it would be better to look at Zookeeper Atomic Broadcast (ZAB)
>>>>> [1] or RAFT leader election [2] algorithms which have already proven
>>>>> results.
>>>>>
>>>>> [1] https://cwiki.apache.org/confluence/display/ZOOKEEPER/Zab1.0
>>>>> [2] http://libraft.io/
>>>>>
>>>>> Thanks.
>>>>>
>>>>> On Thu, Jul 28, 2016 at 1:42 PM, Nandika Jayawardana <[email protected]
>>>>> > wrote:
>>>>>
>>>>>> +1 to make it a common component . We have the clustering
>>>>>> implementation for BPEL component based on hazelcast.  If the 
>>>>>> coordination
>>>>>> is available at RDBMS level, we can remove hazelcast dependancy.
>>>>>>
>>>>>> Regards
>>>>>> Nandika
>>>>>>
>>>>>> On Thu, Jul 28, 2016 at 1:28 PM, Hasitha Aravinda <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Can we make it a common component, which is not hard coupled with
>>>>>>> MB. BPS has the same requirement.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Hasitha.
>>>>>>>
>>>>>>> On Thu, Jul 28, 2016 at 9:47 AM, Asanka Abeyweera <[email protected]
>>>>>>> > wrote:
>>>>>>>
>>>>>>>> Hi All,
>>>>>>>>
>>>>>>>> In MB, we have used a coordinator based approach to manage
>>>>>>>> distributed messaging algorithm in the cluster. Currently Hazelcast is 
>>>>>>>> used
>>>>>>>> to elect the coordinator. But one issue we faced with Hazelcast is, 
>>>>>>>> during
>>>>>>>> a network segmentation (split brain), Hazelcast can elect two or more
>>>>>>>> coordinators in the cluster. This affects the correctness of the
>>>>>>>> distributed messaging algorithm since there are some tables in the 
>>>>>>>> database
>>>>>>>> that should only be edited by a single node (i.e. coordinator).
>>>>>>>>
>>>>>>>> As a solution to this problem we have implemented minimum node
>>>>>>>> count based approach [1] to deactivate set of partitioned nodes to stop
>>>>>>>> multiple nodes becoming coordinators until the network segmentation 
>>>>>>>> issue
>>>>>>>> is fixed.
>>>>>>>>
>>>>>>>> As an alternative solution, we are thinking of implementing an
>>>>>>>> RDBMS based approach to elect the coordinator node in the cluster. By 
>>>>>>>> doing
>>>>>>>> this we can make sure that even during a network segmentation only one 
>>>>>>>> node
>>>>>>>> will be elected as the coordinator node since the election is happening
>>>>>>>> through the database.
>>>>>>>>
>>>>>>>> The algorithm will use a polling mechanism to check the validity of
>>>>>>>> the nodes. To make the election algorithm scalable, only the 
>>>>>>>> coordinator
>>>>>>>> node will be checking status of all the nodes in the cluster and it 
>>>>>>>> will
>>>>>>>> inform other nodes through database when a member is added/left. The 
>>>>>>>> nodes
>>>>>>>> will be only checking for the status of the coordinator node. When a 
>>>>>>>> node
>>>>>>>> detect that coordinator is invalid it will go for a election to elect 
>>>>>>>> a new
>>>>>>>> coordinator.
>>>>>>>>
>>>>>>>> We are currently working on a POC to test how this works with MB's
>>>>>>>> slot based messaging algorithm.
>>>>>>>>
>>>>>>>> thoughts?
>>>>>>>>
>>>>>>>> [1] https://wso2.org/jira/browse/MB-1664
>>>>>>>>
>>>>>>>> --
>>>>>>>> Asanka Abeyweera
>>>>>>>> Senior Software Engineer
>>>>>>>> WSO2 Inc.
>>>>>>>>
>>>>>>>> Phone: +94 712228648
>>>>>>>> Blog: a5anka.github.io
>>>>>>>>
>>>>>>>> <https://wso2.com/signature>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Architecture mailing list
>>>>>>>> [email protected]
>>>>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> --
>>>>>>> Hasitha Aravinda,
>>>>>>> Associate Technical Lead,
>>>>>>> WSO2 Inc.
>>>>>>> Email: [email protected]
>>>>>>> Mobile : +94 718 210 200
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Architecture mailing list
>>>>>>> [email protected]
>>>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Nandika Jayawardana
>>>>>> WSO2 Inc ; http://wso2.com
>>>>>> lean.enterprise.middleware
>>>>>>
>>>>>> _______________________________________________
>>>>>> Architecture mailing list
>>>>>> [email protected]
>>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Akila Ravihansa Perera
>>>>> WSO2 Inc.;  http://wso2.com/
>>>>>
>>>>> Blog: http://ravihansa3000.blogspot.com
>>>>>
>>>>> _______________________________________________
>>>>> Architecture mailing list
>>>>> [email protected]
>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Asanka Abeyweera
>>>> Senior Software Engineer
>>>> WSO2 Inc.
>>>>
>>>> Phone: +94 712228648
>>>> Blog: a5anka.github.io
>>>>
>>>> <https://wso2.com/signature>
>>>>
>>>> _______________________________________________
>>>> Architecture mailing list
>>>> [email protected]
>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>>
>>>>
>>>
>>
>>
>> --
>> Asanka Abeyweera
>> Senior Software Engineer
>> WSO2 Inc.
>>
>> Phone: +94 712228648
>> Blog: a5anka.github.io
>>
>> <https://wso2.com/signature>
>>
>> _______________________________________________
>> Architecture mailing list
>> [email protected]
>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>
>>
>
>
> --
> *Imesh Gunaratne*
> Software Architect
> WSO2 Inc: http://wso2.com
> T: +94 11 214 5345 M: +94 77 374 2057
> W: https://medium.com/@imesh TW: @imesh
> lean. enterprise. middleware
>
>
> _______________________________________________
> Architecture mailing list
> [email protected]
> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>
>


-- 
Asanka Abeyweera
Senior Software Engineer
WSO2 Inc.

Phone: +94 712228648
Blog: a5anka.github.io

<https://wso2.com/signature>

_______________________________________________
Architecture mailing list
[email protected]
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture

Re: [Architecture] RDBMS based coordinator election algorithm for MB

Reply via email to