Re: [Architecture] RDBMS based coordinator election algorithm for MB

Imesh Gunaratne Thu, 04 Aug 2016 19:05:41 -0700

On Fri, Aug 5, 2016 at 7:31 AM, Imesh Gunaratne <[email protected]> wrote:
>
>
> You can see here [3] how K8S has implemented leader election feature for
> the products deployed on top of that to utilize.
>


Correction: Please refer [4].


>
>
>> On Thu, Aug 4, 2016 at 7:27 PM, Asanka Abeyweera <[email protected]>
>> wrote:
>>
>>> Hi Imesh,
>>>
>>> We are not implementing this to overcome a limitation in the
>>> coordination algorithm available in the Hazlecast. We are implementing this
>>> since we need an RDBMS based coordination algorithm (not a network based
>>> algorithm).
>>>
>>
> Are you saying that database connections do not use the same network used
> by Hazelcast?
> 
>
>
>> The reason is, a network based election algorithm will always elect
>>> multiple leaders when the network is partitioned. But if we use a RDBMS
>>> based algorithm this will not happen.
>>>
>>
> I do not think your argument is correct. If there is a problem with the
> network, it may apply to both Hazelcast based solution and database based
> solution.
>
> [4] http://blog.kubernetes.io/2016/01/simple-leader-election
> -with-Kubernetes.html
>
> Thanks
>
>>
>>>
>>> On Thu, Aug 4, 2016 at 7:16 PM, Imesh Gunaratne <[email protected]> wrote:
>>>
>>>> Hi Asanka,
>>>>
>>>> Do we really need to implement a leader election algorithm on our own?
>>>> AFAIU this is a complex problem which has been already solved by several
>>>> algorithms [1]. IMO it would be better to go ahead with an existing well
>>>> established implementation on etcd [1] or Consul [2].
>>>>
>>>> Those provide HTTP APIs for clients to make leader election calls. [3]
>>>> is a client library written in Node.js for etcd based leader election.
>>>>
>>>> [1] https://www.projectcalico.org/using-etcd-for-elections
>>>> [2] https://www.consul.io/docs/guides/leader-election.html
>>>> [3] https://www.npmjs.com/package/etcd-leader
>>>>
>>>> Thanks
>>>>
>>>> On Wed, Aug 3, 2016 at 5:12 PM, Asanka Abeyweera <[email protected]>
>>>> wrote:
>>>>
>>>>> Hi Maninda,
>>>>>
>>>>> Since we are using RDBMS to poll the node status, the cluster will not
>>>>> end up in situation 1,2 or 3. With this approach we consider a node
>>>>> unreachable when it cannot access the database. Therefore an unreachable
>>>>> node can never be the leader.
>>>>>
>>>>> As you have mentioned, we are currently using the RDBMS as an atomic
>>>>> global variable to create the coordinator entry.
>>>>>
>>>>> On Tue, Aug 2, 2016 at 5:22 PM, Maninda Edirisooriya <[email protected]
>>>>> > wrote:
>>>>>
>>>>>> Hi Asanka,
>>>>>>
>>>>>> As I understand the accuracy of electing the leader correctly is
>>>>>> dependent on the election mechanism with RDBMS because there can be edge
>>>>>> cases like,
>>>>>>
>>>>>> 1. Unreachable leader activates during the election process: Then who
>>>>>> becomes the leader?
>>>>>> 2. The elected leader becomes unreachable before the election is
>>>>>> completed: Then will there be a situation where there is no leader?
>>>>>> 3. A leader and a set of nodes are disconnected from the other part
>>>>>> of the cluster and while the leader is trying to remove unreachable 
>>>>>> members
>>>>>> other part is calling an election to make a leader: Who will win?
>>>>>>
>>>>>> RDBMS based election algorithm should handle such cases without
>>>>>> bringing the cluster to an inconsistent state or dead lock in all
>>>>>> concurrent cases. If all these kind of cases cannot be handled isn't it
>>>>>> better to keep the current hazelcast clustering and use the RDBMS only to
>>>>>> handle the split brain scenario? In other words when a new hazelcast 
>>>>>> leader
>>>>>> is elected it should be updated in the RDBMS. If another split party has
>>>>>> already elected a leader, the node who is going to write it to RDBMS 
>>>>>> should
>>>>>> avoid updating it. Simply, the RDBMS can be used as an atomic global
>>>>>> variable to keep the leader name by modifying the hazelcast clustering.
>>>>>> WDYT?
>>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>>
>>>>>> *Maninda Edirisooriya*
>>>>>> Senior Software Engineer
>>>>>>
>>>>>> *WSO2, Inc.*lean.enterprise.middleware.
>>>>>>
>>>>>> *Blog* : http://maninda.blogspot.com/
>>>>>> *E-mail* : [email protected]
>>>>>> *Skype* : @manindae
>>>>>> *Twitter* : @maninda
>>>>>>
>>>>>> On Thu, Jul 28, 2016 at 4:38 PM, Asanka Abeyweera <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Akila,
>>>>>>>
>>>>>>> Let me explain the issue in a different way. Let's assume the MB
>>>>>>> nodes are using two different network interfaces for Hazelcast
>>>>>>> communication and database communication. With such a configuration, 
>>>>>>> there
>>>>>>> can be failures only in the network interface used for Hazelcast
>>>>>>> communication in some nodes. When this happens, there will be two or 
>>>>>>> more
>>>>>>> Hazelcast clusters due to the network segmentation, and as a result 
>>>>>>> there
>>>>>>> will be multiple coordinators. Since every node still have access to the
>>>>>>> database, multiple coordinators can affect the correctness of the data
>>>>>>> stored in the DB. But if we used a RDBMS based approach we won't have
>>>>>>> multiple coordinators due to a network partition in Hazelcast. This is 
>>>>>>> one
>>>>>>> advantage we get from this approach.
>>>>>>>
>>>>>>> Even when we use Zookeeper or RAFT the same issue will be there
>>>>>>> since we are using different interfaces for Hazelcast communication and 
>>>>>>> DB
>>>>>>> communication.
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Jul 28, 2016 at 2:56 PM, Akila Ravihansa Perera <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> What's the advantage of using RDBMS (even as an alternative) to
>>>>>>>> implement a leader/coordinator election? If the network connection to 
>>>>>>>> DB
>>>>>>>> fails then this will be a single point of failure. I don't think we can
>>>>>>>> scale RDBMS instances and expect the election algorithm to work. That 
>>>>>>>> would
>>>>>>>> be reducing this problem to another problem (electing coordinator RDBMS
>>>>>>>> instance).
>>>>>>>>
>>>>>>>> IMHO it would be better to look at Zookeeper Atomic Broadcast (ZAB)
>>>>>>>> [1] or RAFT leader election [2] algorithms which have already proven
>>>>>>>> results.
>>>>>>>>
>>>>>>>> [1] https://cwiki.apache.org/confluence/display/ZOOKEEPER/Zab1.0
>>>>>>>> [2] http://libraft.io/
>>>>>>>>
>>>>>>>> Thanks.
>>>>>>>>
>>>>>>>> On Thu, Jul 28, 2016 at 1:42 PM, Nandika Jayawardana <
>>>>>>>> [email protected]> wrote:
>>>>>>>>
>>>>>>>>> +1 to make it a common component . We have the clustering
>>>>>>>>> implementation for BPEL component based on hazelcast.  If the 
>>>>>>>>> coordination
>>>>>>>>> is available at RDBMS level, we can remove hazelcast dependancy.
>>>>>>>>>
>>>>>>>>> Regards
>>>>>>>>> Nandika
>>>>>>>>>
>>>>>>>>> On Thu, Jul 28, 2016 at 1:28 PM, Hasitha Aravinda <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>>
>>>>>>>>>> Can we make it a common component, which is not hard coupled with
>>>>>>>>>> MB. BPS has the same requirement.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Hasitha.
>>>>>>>>>>
>>>>>>>>>> On Thu, Jul 28, 2016 at 9:47 AM, Asanka Abeyweera <
>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi All,
>>>>>>>>>>>
>>>>>>>>>>> In MB, we have used a coordinator based approach to manage
>>>>>>>>>>> distributed messaging algorithm in the cluster. Currently Hazelcast 
>>>>>>>>>>> is used
>>>>>>>>>>> to elect the coordinator. But one issue we faced with Hazelcast is, 
>>>>>>>>>>> during
>>>>>>>>>>> a network segmentation (split brain), Hazelcast can elect two or 
>>>>>>>>>>> more
>>>>>>>>>>> coordinators in the cluster. This affects the correctness of the
>>>>>>>>>>> distributed messaging algorithm since there are some tables in the 
>>>>>>>>>>> database
>>>>>>>>>>> that should only be edited by a single node (i.e. coordinator).
>>>>>>>>>>>
>>>>>>>>>>> As a solution to this problem we have implemented minimum node
>>>>>>>>>>> count based approach [1] to deactivate set of partitioned nodes to 
>>>>>>>>>>> stop
>>>>>>>>>>> multiple nodes becoming coordinators until the network segmentation 
>>>>>>>>>>> issue
>>>>>>>>>>> is fixed.
>>>>>>>>>>>
>>>>>>>>>>> As an alternative solution, we are thinking of implementing an
>>>>>>>>>>> RDBMS based approach to elect the coordinator node in the cluster. 
>>>>>>>>>>> By doing
>>>>>>>>>>> this we can make sure that even during a network segmentation only 
>>>>>>>>>>> one node
>>>>>>>>>>> will be elected as the coordinator node since the election is 
>>>>>>>>>>> happening
>>>>>>>>>>> through the database.
>>>>>>>>>>>
>>>>>>>>>>> The algorithm will use a polling mechanism to check the validity
>>>>>>>>>>> of the nodes. To make the election algorithm scalable, only the 
>>>>>>>>>>> coordinator
>>>>>>>>>>> node will be checking status of all the nodes in the cluster and it 
>>>>>>>>>>> will
>>>>>>>>>>> inform other nodes through database when a member is added/left. 
>>>>>>>>>>> The nodes
>>>>>>>>>>> will be only checking for the status of the coordinator node. When 
>>>>>>>>>>> a node
>>>>>>>>>>> detect that coordinator is invalid it will go for a election to 
>>>>>>>>>>> elect a new
>>>>>>>>>>> coordinator.
>>>>>>>>>>>
>>>>>>>>>>> We are currently working on a POC to test how this works with
>>>>>>>>>>> MB's slot based messaging algorithm.
>>>>>>>>>>>
>>>>>>>>>>> thoughts?
>>>>>>>>>>>
>>>>>>>>>>> [1] https://wso2.org/jira/browse/MB-1664
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Asanka Abeyweera
>>>>>>>>>>> Senior Software Engineer
>>>>>>>>>>> WSO2 Inc.
>>>>>>>>>>>
>>>>>>>>>>> Phone: +94 712228648
>>>>>>>>>>> Blog: a5anka.github.io
>>>>>>>>>>>
>>>>>>>>>>> <https://wso2.com/signature>
>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> Architecture mailing list
>>>>>>>>>>> [email protected]
>>>>>>>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> --
>>>>>>>>>> Hasitha Aravinda,
>>>>>>>>>> Associate Technical Lead,
>>>>>>>>>> WSO2 Inc.
>>>>>>>>>> Email: [email protected]
>>>>>>>>>> Mobile : +94 718 210 200
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> Architecture mailing list
>>>>>>>>>> [email protected]
>>>>>>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Nandika Jayawardana
>>>>>>>>> WSO2 Inc ; http://wso2.com
>>>>>>>>> lean.enterprise.middleware
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> Architecture mailing list
>>>>>>>>> [email protected]
>>>>>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Akila Ravihansa Perera
>>>>>>>> WSO2 Inc.;  http://wso2.com/
>>>>>>>>
>>>>>>>> Blog: http://ravihansa3000.blogspot.com
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Architecture mailing list
>>>>>>>> [email protected]
>>>>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Asanka Abeyweera
>>>>>>> Senior Software Engineer
>>>>>>> WSO2 Inc.
>>>>>>>
>>>>>>> Phone: +94 712228648
>>>>>>> Blog: a5anka.github.io
>>>>>>>
>>>>>>> <https://wso2.com/signature>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Architecture mailing list
>>>>>>> [email protected]
>>>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Asanka Abeyweera
>>>>> Senior Software Engineer
>>>>> WSO2 Inc.
>>>>>
>>>>> Phone: +94 712228648
>>>>> Blog: a5anka.github.io
>>>>>
>>>>> <https://wso2.com/signature>
>>>>>
>>>>> _______________________________________________
>>>>> Architecture mailing list
>>>>> [email protected]
>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> *Imesh Gunaratne*
>>>> Software Architect
>>>> WSO2 Inc: http://wso2.com
>>>> T: +94 11 214 5345 M: +94 77 374 2057
>>>> W: https://medium.com/@imesh TW: @imesh
>>>> lean. enterprise. middleware
>>>>
>>>>
>>>> _______________________________________________
>>>> Architecture mailing list
>>>> [email protected]
>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>>
>>>>
>>>
>>>
>>> --
>>> Asanka Abeyweera
>>> Senior Software Engineer
>>> WSO2 Inc.
>>>
>>> Phone: +94 712228648
>>> Blog: a5anka.github.io
>>>
>>> <https://wso2.com/signature>
>>>
>>> _______________________________________________
>>> Architecture mailing list
>>> [email protected]
>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>
>>>
>>
>>
>> --
>> Ramith Jayasinghe
>> Technical Lead
>> WSO2 Inc., http://wso2.com
>> lean.enterprise.middleware
>>
>> E: [email protected]
>> P: +94 772534930
>>
>> _______________________________________________
>> Architecture mailing list
>> [email protected]
>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>
>>
>
>
> --
> *Imesh Gunaratne*
> Software Architect
> WSO2 Inc: http://wso2.com
> T: +94 11 214 5345 M: +94 77 374 2057
> W: https://medium.com/@imesh TW: @imesh
> lean. enterprise. middleware
>
>


-- 
*Imesh Gunaratne*
Software Architect
WSO2 Inc: http://wso2.com
T: +94 11 214 5345 M: +94 77 374 2057
W: https://medium.com/@imesh TW: @imesh
lean. enterprise. middleware

_______________________________________________
Architecture mailing list
[email protected]
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture

Re: [Architecture] RDBMS based coordinator election algorithm for MB

Reply via email to