Hi Anjana, Thank you for the suggestion. We have already done a similar thing. We have added a backoff time after creating the leader entry and check if the leader entry is the entry created by self before informing the leader change.
On Tue, Aug 9, 2016 at 12:27 PM, Anjana Fernando <[email protected]> wrote: > I see, thanks for the clarification, looks good! .. I think small thing to > consider is, to avoid the situation where, the current leader goes away, > and two other competes to become the leader, and the first one and the > second one checks (reads) the table to check the last heartbeat and figures > out that the leader is outdated at the same time, and then first one delete > the entry and puts his one, and after that, second one will also delete the > existing one and put his one, so both will think they became the leader, > due to the condition that both succeeded in adding the entry without an > error. So this can probably be fixed by checking back after a bit of time > if the current node is actually me, which probabilistically will work well, > if that time period is sufficient big enough than a typical database > transaction required by a node to do the earlier operations. Or else, we > should make sure the database transaction level used in this scenario is at > least REPEATABLE_READ, where when we read the record, it will lock it > throughout the transaction. So some DBMSs does not support REPEATABLE_READ, > where in that case, we should be able to use SERIALIZABLE, which most of > them support. > > Cheers, > Anjana. > > On Tue, Aug 9, 2016 at 11:11 AM, Maninda Edirisooriya <[email protected]> > wrote: > >> Hi Anjana, >> >> After having an offline chat with Asanka what I understood was that the >> leader election was done completely via the database but with no network >> communication. The leader is mentioned in the database first. Then the >> leader updates the node data periodically in the database. If some node >> realizes the data in the DB are outdated that means the leader was >> disconnected. Then that node will look at the created timestamp of the >> leader entry. If that is not very recent that means there was no new leader >> elected recently. So he will try to update the leader entry with his ID. As >> I understand there the leader entry is using the leader ID and the >> timestamp as the primary key. Even several nodes try to do it >> simultaneously only one node will successfully be able to update the entry >> with the help of atomicity provided by the DB. Others members will note the >> timestamp of the leader was updated so will accept the first one who >> updates as the leader. Even after the leader is elected, the leader will >> only notify node data via updating DB instead of network calls. Other nodes >> will just observe it and check the latest timestmps of the entry. >> >> >> *Maninda Edirisooriya* >> Senior Software Engineer >> >> *WSO2, Inc.*lean.enterprise.middleware. >> >> *Blog* : http://maninda.blogspot.com/ >> *E-mail* : [email protected] >> *Skype* : @manindae >> *Twitter* : @maninda >> >> On Tue, Aug 9, 2016 at 10:13 AM, Anjana Fernando <[email protected]> wrote: >> >>> Hi, >>> >>> I just noticed this thread. I've some concerns on this implementations. >>> First of all, I don't think the statement mentioned here saying an external >>> service such as ZooKeeper doesn't work, is correct. Because, if you have a >>> ZK cluster (it is suppose to be used as a cluster), you will not have any >>> issues. All the nodes have a list of endpoints to all the ZK nodes and they >>> connect to those, and ZK has a quorum based mechanism in keeping its state. >>> So this makes sure, all the users have a single version of the ZK data. >>> >>> Also, I guess the fundamental problem here in the split brain situation >>> is, we need one external entity taking the decision (e.g. ZK cluster), >>> because it should have oversight to the whole environment. I don't see how >>> this RDBMS mechanism would solve that. Because, what it gives is a central >>> location of state persistence. But the decisions of making who is the >>> leader is taken by the users, which can be problematic. Where when we have >>> a network partition scenario in that occasion, two groups of users will be >>> overriding each other in the centralized RDBMS data repeatedly and it will >>> go on forever, where in the ZK situation, there will be only one leader, >>> and the guys in the other partition simply won't be able to reach the >>> leader, until its network issues are sorted. >>> >>> So I also think, as Imesh mentioned, creating a coordination algorithm >>> from scratch may not be a wise decision, and we should use proven >>> technology/libraries to do that. And on a side note, the main reason for >>> not using ZK for this earlier was because of the hassle of bringing up >>> another set of servers when our products are clustered, and we knew that >>> the split brain scenario will occur in HZ, but maybe now we should give an >>> extension point probably to plug into an external service if for some >>> applications the split brain scenario is a show stopper. >>> >>> Cheers, >>> Anjana. >>> >>> On Tue, Aug 9, 2016 at 4:45 AM, Kasun Indrasiri <[email protected]> wrote: >>> >>>> Hi Ramith/Asanka, >>>> >>>> ESB/DSS natask impl is also based on HZ. I guess if this model works >>>> for the MB, we should make it generic for all such coordination >>>> requirements. (Thinking about using this in ESB 5.1)? >>>> >>>> On Fri, Aug 5, 2016 at 3:58 AM, Sajini De Silva <[email protected]> >>>> wrote: >>>> >>>>> Hi Maninda, >>>>> >>>>> Locking the database will be supported by some databases but there >>>>> will be huge performance impact. So we cannot use that approach. If this >>>>> approach cannot be adapted the only thing we can do is queue wise load >>>>> balancing through slot coordinator. But in this case we cannot guarantee >>>>> that load balance will be equally distributed since some queues can be >>>>> loaded while some will be idle. Also we cannot have multiple slot >>>>> coordinators having same queue as it may cause several complications such >>>>> as, same slot is assigned to two nodes by different subscribers, message >>>>> duplication etc. Actually this slot architecture was discussed in a >>>>> separate mail thread before it is implemented. >>>>> >>>>> Thanks >>>>> >>>>> On Fri, Aug 5, 2016 at 3:12 PM, Maninda Edirisooriya <[email protected] >>>>> > wrote: >>>>> >>>>>> Hi Sajini, >>>>>> >>>>>> Yes that is what I meant. As the number of slots are proportional to >>>>>> the number of messages passing through the cluster, slot delivery should >>>>>> not be handled by the coordinator when there is only one coordinator in >>>>>> the >>>>>> cluster which is a bottleneck for scaling messages passing through the >>>>>> cluster. If there is only a single coordinator, it should handle >>>>>> operations >>>>>> that are not proportional to messages throughput of the cluster. Then >>>>>> only >>>>>> the tasks like subscriber adding / removing should be handled by the >>>>>> coordinator. As this is not the current implementation, we can consider >>>>>> multiple coordinator approach. Then the number of coordinators should be >>>>>> scalable with the message throughout. I am not sure whether locking the >>>>>> database per transaction would achieve this coordinator scalability in >>>>>> the >>>>>> multiple coordinator implementation. >>>>>> >>>>>> Thanks. >>>>>> >>>>>> >>>>>> *Maninda Edirisooriya* >>>>>> Senior Software Engineer >>>>>> >>>>>> *WSO2, Inc.*lean.enterprise.middleware. >>>>>> >>>>>> *Blog* : http://maninda.blogspot.com/ >>>>>> *E-mail* : [email protected] >>>>>> *Skype* : @manindae >>>>>> *Twitter* : @maninda >>>>>> >>>>>> On Fri, Aug 5, 2016 at 2:42 PM, Sajini De Silva <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Hi Maninda, >>>>>>> >>>>>>> On Fri, Aug 5, 2016 at 2:28 PM, Maninda Edirisooriya < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> @Sajini, >>>>>>>> >>>>>>>> But the number of slots are proportional to the number of messages >>>>>>>> pass through the MB which needs to be handled by the coordinator. That >>>>>>>> is >>>>>>>> what I meant by "information related to meta data of messages pass >>>>>>>> through >>>>>>>> a single coordinator". Ideally after the senders and receivers are >>>>>>>> subscribed to the cluster, coordinator should have nothing to do until >>>>>>>> they >>>>>>>> are removed or changed. >>>>>>>> >>>>>>> >>>>>>> Even though it is possible to have multiple coordinators after >>>>>>> having en effort (Lock the database for a whole transaction or the work >>>>>>> load distribution as described by Ramith) , coordinator may have >>>>>>> different >>>>>>> work to do other than subscriber adding and removing. As I said earlier >>>>>>> our >>>>>>> MB message distribution system is based on slot architecture and slots >>>>>>> will >>>>>>> managed by the coordinator. You can read [1] to understand more about >>>>>>> slot >>>>>>> architecture in MB. >>>>>>> >>>>>>> [1] http://sajinid.blogspot.com/2015/03/wso2-message-broker- >>>>>>> 300-slot-based.html >>>>>>> >>>>>>> Thanks >>>>>>> >>>>>>>> >>>>>>>> @Ramith, >>>>>>>> >>>>>>>> +1 for multiple coordinators by partitioning the cluster which >>>>>>>> maintains the simplicity and correctness of the algorithm than >>>>>>>> compromising >>>>>>>> simplicity with a less important factor like "delivering a good mix of >>>>>>>> messages". >>>>>>>> >>>>>>>> Thanks. >>>>>>>> >>>>>>>> >>>>>>>> *Maninda Edirisooriya* >>>>>>>> Senior Software Engineer >>>>>>>> >>>>>>>> *WSO2, Inc.*lean.enterprise.middleware. >>>>>>>> >>>>>>>> *Blog* : http://maninda.blogspot.com/ >>>>>>>> *E-mail* : [email protected] >>>>>>>> *Skype* : @manindae >>>>>>>> *Twitter* : @maninda >>>>>>>> >>>>>>>> On Fri, Aug 5, 2016 at 2:05 PM, Ramith Jayasinghe <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> @Imesh, >>>>>>>>> We can prove that doing leader election using a lib (where we >>>>>>>>> maintain cluster state in another place, a.k.a DB) will not solve our >>>>>>>>> original problem (this also relates to our past experience with both >>>>>>>>> the >>>>>>>>> zookeeper and hazelcast). >>>>>>>>> We can make this implementation a common component if other >>>>>>>>> products have a use of it. BPS might be able to use it since their >>>>>>>>> data is >>>>>>>>> also in the database. >>>>>>>>> >>>>>>>>> @Malaka: >>>>>>>>> VFS scenario can't be solved by relying on this >>>>>>>>> implementation. why? you can have the access to DB but not VFS >>>>>>>>> resources/file (and vice versa). this is the same point we explained >>>>>>>>> before. >>>>>>>>> in Ntask implementation, if tasks are stored in the database >>>>>>>>> then using this implementation makes sense. >>>>>>>>> >>>>>>>>> >>>>>>>>> @Akila, >>>>>>>>> implementing (distributed) a queue algorithm is non-trivial. >>>>>>>>> Having one coordinator (single source of truth) keeps things simple >>>>>>>>> hence >>>>>>>>> it's a conscious design decision we agreed during the initial stages. >>>>>>>>> However, possible extension to this scheme is to have multiple >>>>>>>>> coordinators >>>>>>>>> ( each responsible for coordinating a subset of queues in the >>>>>>>>> cluster), >>>>>>>>> that will be some what similar to kafka. >>>>>>>>> Even if its preferable to have no coordinator at-all, (to decide >>>>>>>>> how messages are disseminated in the cluster) that will make us give >>>>>>>>> up >>>>>>>>> desired behaviour such as delivering a good mix of messages (from >>>>>>>>> different >>>>>>>>> publishers) to consumers in a cluster. having said this, we have an >>>>>>>>> ongoing >>>>>>>>> research on how to improve the algorithm and we like to try out both >>>>>>>>> these >>>>>>>>> approaches. >>>>>>>>> >>>>>>>>> >>>>>>>>> On Fri, Aug 5, 2016 at 1:31 PM, Malaka Silva <[email protected]> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> The same issue with Hazelcast can be experienced with ESB >>>>>>>>>> inbounds (running on top of NTASK) and VFS distribution locks. >>>>>>>>>> >>>>>>>>>> The idea of only single worker works at a given time breaks if >>>>>>>>>> there is a Hazelcast heart beat fails. This will make two workers to >>>>>>>>>> work >>>>>>>>>> in parallel. >>>>>>>>>> >>>>>>>>>> Also with distributed locking there is no guarantee that file is >>>>>>>>>> only process only by one worker. >>>>>>>>>> >>>>>>>>>> So in the case of network fail >>>>>>>>>> with DB >>>>>>>>>> make sense to stop processing until it's recovered. >>>>>>>>>> Also making this component generic ESB can reuse. >>>>>>>>>> >>>>>>>>>> On Fri, Aug 5, 2016 at 9:21 AM, Asitha Nanayakkara < >>>>>>>>>> [email protected]> wrote: >>>>>>>>>> >>>>>>>>>>> Hi Imesh, >>>>>>>>>>> >>>>>>>>>>> On Fri, Aug 5, 2016 at 7:33 AM, Imesh Gunaratne <[email protected]> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Fri, Aug 5, 2016 at 7:31 AM, Imesh Gunaratne <[email protected] >>>>>>>>>>>> > wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> You can see here [3] how K8S has implemented leader election >>>>>>>>>>>>> feature for the products deployed on top of that to utilize. >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Correction: Please refer [4]. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> On Thu, Aug 4, 2016 at 7:27 PM, Asanka Abeyweera < >>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi Imesh, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> We are not implementing this to overcome a limitation in the >>>>>>>>>>>>>>> coordination algorithm available in the Hazlecast. We are >>>>>>>>>>>>>>> implementing this >>>>>>>>>>>>>>> since we need an RDBMS based coordination algorithm (not a >>>>>>>>>>>>>>> network based >>>>>>>>>>>>>>> algorithm). >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> Are you saying that database connections do not use the same >>>>>>>>>>>>> network used by Hazelcast? >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> The reason is, a network based election algorithm will always >>>>>>>>>>>>>>> elect multiple leaders when the network is partitioned. But if >>>>>>>>>>>>>>> we use a >>>>>>>>>>>>>>> RDBMS based algorithm this will not happen. >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> I do not think your argument is correct. If there is a >>>>>>>>>>>>> problem with the network, it may apply to both Hazelcast based >>>>>>>>>>>>> solution >>>>>>>>>>>>> and database based solution. >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> Yes, if the same network interface is used network partion will >>>>>>>>>>> cause all types of connections to be partitioned. But user can use >>>>>>>>>>> multiple >>>>>>>>>>> network interfaces for database, Hazelcast and thrift. >>>>>>>>>>> >>>>>>>>>>> Following is the scenario we are trying to solve in MB. >>>>>>>>>>> >>>>>>>>>>> In MB all the details related to messages, subscriptions, >>>>>>>>>>> queues, topics etc are stored in database. And we operate depending >>>>>>>>>>> on that >>>>>>>>>>> information. If the MB node can't connect to the database that >>>>>>>>>>> means the >>>>>>>>>>> node is ineffective in the cluster until it can make a database >>>>>>>>>>> connection. >>>>>>>>>>> >>>>>>>>>>> We have seen instances where Hazelcast cluster get partitioned >>>>>>>>>>> for some time period in networks, Reasons were, >>>>>>>>>>> >>>>>>>>>>> 1. Due to heavy load Hazelcast couldn't process or send >>>>>>>>>>> (some times both) hearbeats, hence a network partition for >>>>>>>>>>> Hazelcast cluster >>>>>>>>>>> 2. An actual network partition of Hazelcast cluster >>>>>>>>>>> >>>>>>>>>>> In both scenarios the database connection was working. In that >>>>>>>>>>> case we get two coordinators elected through Hazelcast and working >>>>>>>>>>> on the >>>>>>>>>>> same database to deliver the messages. this leads to >>>>>>>>>>> inconsistencies in the >>>>>>>>>>> cluster behavior (for instances duplicate message delivery, >>>>>>>>>>> corrupred >>>>>>>>>>> subscription states etc) . >>>>>>>>>>> >>>>>>>>>>> Since the point of interest for MB is the database, we decided >>>>>>>>>>> to do the coordinator election through database as well. If the >>>>>>>>>>> node can't >>>>>>>>>>> connect to the database, then the MB won't operate anyway. >>>>>>>>>>> >>>>>>>>>>> Regards, >>>>>>>>>>> Asitha >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>> [4] http://blog.kubernetes.io/2016/01/simple-leader-election >>>>>>>>>>>>> -with-Kubernetes.html >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Thu, Aug 4, 2016 at 7:16 PM, Imesh Gunaratne < >>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hi Asanka, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Do we really need to implement a leader election algorithm >>>>>>>>>>>>>>>> on our own? AFAIU this is a complex problem which has been >>>>>>>>>>>>>>>> already solved >>>>>>>>>>>>>>>> by several algorithms [1]. IMO it would be better to go ahead >>>>>>>>>>>>>>>> with an >>>>>>>>>>>>>>>> existing well established implementation on etcd [1] or Consul >>>>>>>>>>>>>>>> [2]. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Those provide HTTP APIs for clients to make leader election >>>>>>>>>>>>>>>> calls. [3] is a client library written in Node.js for etcd >>>>>>>>>>>>>>>> based leader >>>>>>>>>>>>>>>> election. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> [1] https://www.projectcalico.org/using-etcd-for-elections >>>>>>>>>>>>>>>> [2] https://www.consul.io/docs/guides/leader-election.html >>>>>>>>>>>>>>>> [3] https://www.npmjs.com/package/etcd-leader >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thanks >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Wed, Aug 3, 2016 at 5:12 PM, Asanka Abeyweera < >>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hi Maninda, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Since we are using RDBMS to poll the node status, the >>>>>>>>>>>>>>>>> cluster will not end up in situation 1,2 or 3. With this >>>>>>>>>>>>>>>>> approach we >>>>>>>>>>>>>>>>> consider a node unreachable when it cannot access the >>>>>>>>>>>>>>>>> database. Therefore >>>>>>>>>>>>>>>>> an unreachable node can never be the leader. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> As you have mentioned, we are currently using the RDBMS as >>>>>>>>>>>>>>>>> an atomic global variable to create the coordinator entry. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Tue, Aug 2, 2016 at 5:22 PM, Maninda Edirisooriya < >>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Hi Asanka, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> As I understand the accuracy of electing the leader >>>>>>>>>>>>>>>>>> correctly is dependent on the election mechanism with RDBMS >>>>>>>>>>>>>>>>>> because there >>>>>>>>>>>>>>>>>> can be edge cases like, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> 1. Unreachable leader activates during the election >>>>>>>>>>>>>>>>>> process: Then who becomes the leader? >>>>>>>>>>>>>>>>>> 2. The elected leader becomes unreachable before the >>>>>>>>>>>>>>>>>> election is completed: Then will there be a situation where >>>>>>>>>>>>>>>>>> there is no >>>>>>>>>>>>>>>>>> leader? >>>>>>>>>>>>>>>>>> 3. A leader and a set of nodes are disconnected from the >>>>>>>>>>>>>>>>>> other part of the cluster and while the leader is trying to >>>>>>>>>>>>>>>>>> remove >>>>>>>>>>>>>>>>>> unreachable members other part is calling an election to >>>>>>>>>>>>>>>>>> make a leader: Who >>>>>>>>>>>>>>>>>> will win? >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> RDBMS based election algorithm should handle such cases >>>>>>>>>>>>>>>>>> without bringing the cluster to an inconsistent state or >>>>>>>>>>>>>>>>>> dead lock in all >>>>>>>>>>>>>>>>>> concurrent cases. If all these kind of cases cannot be >>>>>>>>>>>>>>>>>> handled isn't it >>>>>>>>>>>>>>>>>> better to keep the current hazelcast clustering and use the >>>>>>>>>>>>>>>>>> RDBMS only to >>>>>>>>>>>>>>>>>> handle the split brain scenario? In other words when a new >>>>>>>>>>>>>>>>>> hazelcast leader >>>>>>>>>>>>>>>>>> is elected it should be updated in the RDBMS. If another >>>>>>>>>>>>>>>>>> split party has >>>>>>>>>>>>>>>>>> already elected a leader, the node who is going to write it >>>>>>>>>>>>>>>>>> to RDBMS should >>>>>>>>>>>>>>>>>> avoid updating it. Simply, the RDBMS can be used as an >>>>>>>>>>>>>>>>>> atomic global >>>>>>>>>>>>>>>>>> variable to keep the leader name by modifying the hazelcast >>>>>>>>>>>>>>>>>> clustering. >>>>>>>>>>>>>>>>>> WDYT? >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Thanks. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> *Maninda Edirisooriya* >>>>>>>>>>>>>>>>>> Senior Software Engineer >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> *WSO2, Inc.*lean.enterprise.middleware. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> *Blog* : http://maninda.blogspot.com/ >>>>>>>>>>>>>>>>>> *E-mail* : [email protected] >>>>>>>>>>>>>>>>>> *Skype* : @manindae >>>>>>>>>>>>>>>>>> *Twitter* : @maninda >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Thu, Jul 28, 2016 at 4:38 PM, Asanka Abeyweera < >>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Hi Akila, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Let me explain the issue in a different way. Let's >>>>>>>>>>>>>>>>>>> assume the MB nodes are using two different network >>>>>>>>>>>>>>>>>>> interfaces for >>>>>>>>>>>>>>>>>>> Hazelcast communication and database communication. With >>>>>>>>>>>>>>>>>>> such a >>>>>>>>>>>>>>>>>>> configuration, there can be failures only in the network >>>>>>>>>>>>>>>>>>> interface used for >>>>>>>>>>>>>>>>>>> Hazelcast communication in some nodes. When this happens, >>>>>>>>>>>>>>>>>>> there will be two >>>>>>>>>>>>>>>>>>> or more Hazelcast clusters due to the network segmentation, >>>>>>>>>>>>>>>>>>> and as a result >>>>>>>>>>>>>>>>>>> there will be multiple coordinators. Since every node still >>>>>>>>>>>>>>>>>>> have access to >>>>>>>>>>>>>>>>>>> the database, multiple coordinators can affect the >>>>>>>>>>>>>>>>>>> correctness of the data >>>>>>>>>>>>>>>>>>> stored in the DB. But if we used a RDBMS based approach we >>>>>>>>>>>>>>>>>>> won't have >>>>>>>>>>>>>>>>>>> multiple coordinators due to a network partition in >>>>>>>>>>>>>>>>>>> Hazelcast. This is one >>>>>>>>>>>>>>>>>>> advantage we get from this approach. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Even when we use Zookeeper or RAFT the same issue will >>>>>>>>>>>>>>>>>>> be there since we are using different interfaces for >>>>>>>>>>>>>>>>>>> Hazelcast >>>>>>>>>>>>>>>>>>> communication and DB communication. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Thu, Jul 28, 2016 at 2:56 PM, Akila Ravihansa Perera >>>>>>>>>>>>>>>>>>> <[email protected]> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> What's the advantage of using RDBMS (even as an >>>>>>>>>>>>>>>>>>>> alternative) to implement a leader/coordinator election? >>>>>>>>>>>>>>>>>>>> If the network >>>>>>>>>>>>>>>>>>>> connection to DB fails then this will be a single point of >>>>>>>>>>>>>>>>>>>> failure. I don't >>>>>>>>>>>>>>>>>>>> think we can scale RDBMS instances and expect the election >>>>>>>>>>>>>>>>>>>> algorithm to >>>>>>>>>>>>>>>>>>>> work. That would be reducing this problem to another >>>>>>>>>>>>>>>>>>>> problem (electing >>>>>>>>>>>>>>>>>>>> coordinator RDBMS instance). >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> IMHO it would be better to look at Zookeeper Atomic >>>>>>>>>>>>>>>>>>>> Broadcast (ZAB) [1] or RAFT leader election [2] algorithms >>>>>>>>>>>>>>>>>>>> which have >>>>>>>>>>>>>>>>>>>> already proven results. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> [1] https://cwiki.apache.org/c >>>>>>>>>>>>>>>>>>>> onfluence/display/ZOOKEEPER/Zab1.0 >>>>>>>>>>>>>>>>>>>> [2] http://libraft.io/ >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Thanks. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On Thu, Jul 28, 2016 at 1:42 PM, Nandika Jayawardana < >>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> +1 to make it a common component . We have the >>>>>>>>>>>>>>>>>>>>> clustering implementation for BPEL component based on >>>>>>>>>>>>>>>>>>>>> hazelcast. If the >>>>>>>>>>>>>>>>>>>>> coordination is available at RDBMS level, we can remove >>>>>>>>>>>>>>>>>>>>> hazelcast >>>>>>>>>>>>>>>>>>>>> dependancy. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Regards >>>>>>>>>>>>>>>>>>>>> Nandika >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> On Thu, Jul 28, 2016 at 1:28 PM, Hasitha Aravinda < >>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Can we make it a common component, which is not hard >>>>>>>>>>>>>>>>>>>>>> coupled with MB. BPS has the same requirement. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>> Hasitha. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> On Thu, Jul 28, 2016 at 9:47 AM, Asanka Abeyweera < >>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Hi All, >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> In MB, we have used a coordinator based approach to >>>>>>>>>>>>>>>>>>>>>>> manage distributed messaging algorithm in the cluster. >>>>>>>>>>>>>>>>>>>>>>> Currently Hazelcast >>>>>>>>>>>>>>>>>>>>>>> is used to elect the coordinator. But one issue we >>>>>>>>>>>>>>>>>>>>>>> faced with Hazelcast is, >>>>>>>>>>>>>>>>>>>>>>> during a network segmentation (split brain), Hazelcast >>>>>>>>>>>>>>>>>>>>>>> can elect two or >>>>>>>>>>>>>>>>>>>>>>> more coordinators in the cluster. This affects the >>>>>>>>>>>>>>>>>>>>>>> correctness of the >>>>>>>>>>>>>>>>>>>>>>> distributed messaging algorithm since there are some >>>>>>>>>>>>>>>>>>>>>>> tables in the database >>>>>>>>>>>>>>>>>>>>>>> that should only be edited by a single node (i.e. >>>>>>>>>>>>>>>>>>>>>>> coordinator). >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> As a solution to this problem we have implemented >>>>>>>>>>>>>>>>>>>>>>> minimum node count based approach [1] to deactivate set >>>>>>>>>>>>>>>>>>>>>>> of partitioned >>>>>>>>>>>>>>>>>>>>>>> nodes to stop multiple nodes becoming coordinators >>>>>>>>>>>>>>>>>>>>>>> until the network >>>>>>>>>>>>>>>>>>>>>>> segmentation issue is fixed. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> As an alternative solution, we are thinking of >>>>>>>>>>>>>>>>>>>>>>> implementing an RDBMS based approach to elect the >>>>>>>>>>>>>>>>>>>>>>> coordinator node in the >>>>>>>>>>>>>>>>>>>>>>> cluster. By doing this we can make sure that even >>>>>>>>>>>>>>>>>>>>>>> during a network >>>>>>>>>>>>>>>>>>>>>>> segmentation only one node will be elected as the >>>>>>>>>>>>>>>>>>>>>>> coordinator node since >>>>>>>>>>>>>>>>>>>>>>> the election is happening through the database. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> The algorithm will use a polling mechanism to check >>>>>>>>>>>>>>>>>>>>>>> the validity of the nodes. To make the election >>>>>>>>>>>>>>>>>>>>>>> algorithm scalable, only >>>>>>>>>>>>>>>>>>>>>>> the coordinator node will be checking status of all the >>>>>>>>>>>>>>>>>>>>>>> nodes in the >>>>>>>>>>>>>>>>>>>>>>> cluster and it will inform other nodes through database >>>>>>>>>>>>>>>>>>>>>>> when a member is >>>>>>>>>>>>>>>>>>>>>>> added/left. The nodes will be only checking for the >>>>>>>>>>>>>>>>>>>>>>> status of the >>>>>>>>>>>>>>>>>>>>>>> coordinator node. When a node detect that coordinator >>>>>>>>>>>>>>>>>>>>>>> is invalid it will go >>>>>>>>>>>>>>>>>>>>>>> for a election to elect a new coordinator. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> We are currently working on a POC to test how this >>>>>>>>>>>>>>>>>>>>>>> works with MB's slot based messaging algorithm. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> thoughts? >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> [1] https://wso2.org/jira/browse/MB-1664 >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>>>>>> Asanka Abeyweera >>>>>>>>>>>>>>>>>>>>>>> Senior Software Engineer >>>>>>>>>>>>>>>>>>>>>>> WSO2 Inc. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Phone: +94 712228648 >>>>>>>>>>>>>>>>>>>>>>> Blog: a5anka.github.io >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> <https://wso2.com/signature> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>>>>>>>>> Architecture mailing list >>>>>>>>>>>>>>>>>>>>>>> [email protected] >>>>>>>>>>>>>>>>>>>>>>> https://mail.wso2.org/cgi-bin/ >>>>>>>>>>>>>>>>>>>>>>> mailman/listinfo/architecture >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>>>>> Hasitha Aravinda, >>>>>>>>>>>>>>>>>>>>>> Associate Technical Lead, >>>>>>>>>>>>>>>>>>>>>> WSO2 Inc. >>>>>>>>>>>>>>>>>>>>>> Email: [email protected] >>>>>>>>>>>>>>>>>>>>>> Mobile : +94 718 210 200 >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>>>>>>>> Architecture mailing list >>>>>>>>>>>>>>>>>>>>>> [email protected] >>>>>>>>>>>>>>>>>>>>>> https://mail.wso2.org/cgi-bin/ >>>>>>>>>>>>>>>>>>>>>> mailman/listinfo/architecture >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>>>> Nandika Jayawardana >>>>>>>>>>>>>>>>>>>>> WSO2 Inc ; http://wso2.com >>>>>>>>>>>>>>>>>>>>> lean.enterprise.middleware >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>>>>>>> Architecture mailing list >>>>>>>>>>>>>>>>>>>>> [email protected] >>>>>>>>>>>>>>>>>>>>> https://mail.wso2.org/cgi-bin/ >>>>>>>>>>>>>>>>>>>>> mailman/listinfo/architecture >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>>> Akila Ravihansa Perera >>>>>>>>>>>>>>>>>>>> WSO2 Inc.; http://wso2.com/ >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Blog: http://ravihansa3000.blogspot.com >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>>>>>> Architecture mailing list >>>>>>>>>>>>>>>>>>>> [email protected] >>>>>>>>>>>>>>>>>>>> https://mail.wso2.org/cgi-bin/ >>>>>>>>>>>>>>>>>>>> mailman/listinfo/architecture >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>> Asanka Abeyweera >>>>>>>>>>>>>>>>>>> Senior Software Engineer >>>>>>>>>>>>>>>>>>> WSO2 Inc. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Phone: +94 712228648 >>>>>>>>>>>>>>>>>>> Blog: a5anka.github.io >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> <https://wso2.com/signature> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>>>>> Architecture mailing list >>>>>>>>>>>>>>>>>>> [email protected] >>>>>>>>>>>>>>>>>>> https://mail.wso2.org/cgi-bin/ >>>>>>>>>>>>>>>>>>> mailman/listinfo/architecture >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>> Asanka Abeyweera >>>>>>>>>>>>>>>>> Senior Software Engineer >>>>>>>>>>>>>>>>> WSO2 Inc. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Phone: +94 712228648 >>>>>>>>>>>>>>>>> Blog: a5anka.github.io >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> <https://wso2.com/signature> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>>> Architecture mailing list >>>>>>>>>>>>>>>>> [email protected] >>>>>>>>>>>>>>>>> https://mail.wso2.org/cgi-bin/ >>>>>>>>>>>>>>>>> mailman/listinfo/architecture >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>> *Imesh Gunaratne* >>>>>>>>>>>>>>>> Software Architect >>>>>>>>>>>>>>>> WSO2 Inc: http://wso2.com >>>>>>>>>>>>>>>> T: +94 11 214 5345 M: +94 77 374 2057 >>>>>>>>>>>>>>>> W: https://medium.com/@imesh TW: @imesh >>>>>>>>>>>>>>>> lean. enterprise. middleware >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>> Architecture mailing list >>>>>>>>>>>>>>>> [email protected] >>>>>>>>>>>>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>> Asanka Abeyweera >>>>>>>>>>>>>>> Senior Software Engineer >>>>>>>>>>>>>>> WSO2 Inc. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Phone: +94 712228648 >>>>>>>>>>>>>>> Blog: a5anka.github.io >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> <https://wso2.com/signature> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>> Architecture mailing list >>>>>>>>>>>>>>> [email protected] >>>>>>>>>>>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> Ramith Jayasinghe >>>>>>>>>>>>>> Technical Lead >>>>>>>>>>>>>> WSO2 Inc., http://wso2.com >>>>>>>>>>>>>> lean.enterprise.middleware >>>>>>>>>>>>>> >>>>>>>>>>>>>> E: [email protected] >>>>>>>>>>>>>> P: +94 772534930 >>>>>>>>>>>>>> >>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>> Architecture mailing list >>>>>>>>>>>>>> [email protected] >>>>>>>>>>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> *Imesh Gunaratne* >>>>>>>>>>>>> Software Architect >>>>>>>>>>>>> WSO2 Inc: http://wso2.com >>>>>>>>>>>>> T: +94 11 214 5345 M: +94 77 374 2057 >>>>>>>>>>>>> W: https://medium.com/@imesh TW: @imesh >>>>>>>>>>>>> lean. enterprise. middleware >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> *Imesh Gunaratne* >>>>>>>>>>>> Software Architect >>>>>>>>>>>> WSO2 Inc: http://wso2.com >>>>>>>>>>>> T: +94 11 214 5345 M: +94 77 374 2057 >>>>>>>>>>>> W: https://medium.com/@imesh TW: @imesh >>>>>>>>>>>> lean. enterprise. middleware >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> Architecture mailing list >>>>>>>>>>>> [email protected] >>>>>>>>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> *Asitha Nanayakkara* <http://asitha.github.io/> >>>>>>>>>>> Senior Software Engineer >>>>>>>>>>> WSO2, Inc. <http://wso2.com/> >>>>>>>>>>> Mob: +94 77 853 0682 >>>>>>>>>>> [image: https://wso2.com/signature] <https://wso2.com/signature> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> Architecture mailing list >>>>>>>>>>> [email protected] >>>>>>>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> >>>>>>>>>> Best Regards, >>>>>>>>>> >>>>>>>>>> Malaka Silva >>>>>>>>>> Senior Technical Lead >>>>>>>>>> M: +94 777 219 791 >>>>>>>>>> Tel : 94 11 214 5345 >>>>>>>>>> Fax :94 11 2145300 >>>>>>>>>> Skype : malaka.sampath.silva >>>>>>>>>> LinkedIn : http://www.linkedin.com/pub/malaka-silva/6/33/77 >>>>>>>>>> Blog : http://mrmalakasilva.blogspot.com/ >>>>>>>>>> >>>>>>>>>> WSO2, Inc. >>>>>>>>>> lean . enterprise . middleware >>>>>>>>>> https://wso2.com/signature >>>>>>>>>> http://www.wso2.com/about/team/malaka-silva/ >>>>>>>>>> <http://wso2.com/about/team/malaka-silva/> >>>>>>>>>> https://store.wso2.com/store/ >>>>>>>>>> >>>>>>>>>> Don't make Trees rare, we should keep them with care >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> Architecture mailing list >>>>>>>>>> [email protected] >>>>>>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Ramith Jayasinghe >>>>>>>>> Technical Lead >>>>>>>>> WSO2 Inc., http://wso2.com >>>>>>>>> lean.enterprise.middleware >>>>>>>>> >>>>>>>>> E: [email protected] >>>>>>>>> P: +94 772534930 >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> Architecture mailing list >>>>>>>>> [email protected] >>>>>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Sajini De SIlva >>>>>>> Senior Software Engineer; WSO2 Inc.; http://wso2.com , >>>>>>> Email: [email protected] >>>>>>> Blog: http://sajinid.blogspot.com/ >>>>>>> Git hub profile: https://github.com/sajinidesilva >>>>>>> >>>>>>> Phone: +94 712797729 >>>>>>> >>>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Sajini De SIlva >>>>> Senior Software Engineer; WSO2 Inc.; http://wso2.com , >>>>> Email: [email protected] >>>>> Blog: http://sajinid.blogspot.com/ >>>>> Git hub profile: https://github.com/sajinidesilva >>>>> >>>>> Phone: +94 712797729 >>>>> >>>>> >>>>> _______________________________________________ >>>>> Architecture mailing list >>>>> [email protected] >>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >>>>> >>>>> >>>> >>>> >>>> -- >>>> Kasun Indrasiri >>>> Director, Integration Technologies >>>> WSO2, Inc.; http://wso2.com >>>> lean.enterprise.middleware >>>> >>>> cell: +1 650 450 2293 >>>> Blog : http://kasunpanorama.blogspot.com/ >>>> >>> >>> >>> >>> -- >>> *Anjana Fernando* >>> Associate Director / Architect >>> WSO2 Inc. | http://wso2.com >>> lean . enterprise . middleware >>> >>> _______________________________________________ >>> Architecture mailing list >>> [email protected] >>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >>> >>> >> > > > -- > *Anjana Fernando* > Associate Director / Architect > WSO2 Inc. | http://wso2.com > lean . enterprise . middleware > -- Asanka Abeyweera Senior Software Engineer WSO2 Inc. Phone: +94 712228648 Blog: a5anka.github.io <https://wso2.com/signature>
_______________________________________________ Architecture mailing list [email protected] https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
