Re: [Dev] about network partitions

Asitha Nanayakkara Mon, 08 Jun 2015 00:00:52 -0700

Hi Asanka,

Adding dev@


On Mon, Jun 8, 2015 at 12:04 PM, Asanka Abeyweera <[email protected]> wrote:

> Hi all,
>
> How are we going to handle following case with hazelcast?
>
> Assume we had an 8 node MB cluster and due to a network failure cluster
> divided in to two partitions with 4 nodes each. Now each partition have its
> own hazlecast cluster. But both the partitions are pointed to a single DB.
> Since slot manager users a range to define a slot, a slot can include
> messages from other partition's publishers. One side effect of this is
> message duplication which should not happen with queues. Another one is the
> message content removed by other partition before delivery. There can be
> some other complications too.
>
> WDYT?
>

Yes I Agree with you on this. In my opinion in a Hazelcast partitioned
scenario we can't take decisions depending on Hazelcast. What matters here
is DB access. If we can have some sort of a lock for Slot coordinator in
terms of database then we might be able to get away with most of the
complexities involved. I'v talked about this in another mail thread as well
[1] If there is no DB access, anyway there is nothing that a slot
coordinator can do.

Since DB access is vital for slot coordinator we might be better off using
database specific locking mechanism at all times without depending on
Hazelcast. WDYT?

[1] [MB] Hazelcast coordinator issue after cluster partitioning

Thanks,
Asitha


>
>
> On Mon, Jun 1, 2015 at 11:20 AM, Asitha Nanayakkara <[email protected]>
> wrote:
>
>> Hi all,
>>
>> What if we use the Hazelcast node list first member as coordinator
>> (suggested by the Hazelcast support). In an event of a member left and
>> member joined we evaluate the node list and check whether the node lists
>> first member has changed. If that's changed we fire a coordinator changed
>> event with the new coordinator details (this should be done in the kernel).
>> And we write our coordinator logic depending on this event. Current slot
>> coordinator might receive that he is not the coordinator so he can stop.
>> New slot coordinator can start. Others can updated coordinator details.
>>
>> IMHO at all times regardless of the cluster is partitioned or not, there
>> should be only one slot coordinator.
>>
>> In a situation where each node has access to DB (separate network card)
>> but doesn't have access to coordination thru Hazelcast (malfunctioning
>> network card) then there will be a cluster partition. And multiple slot
>> coordinators will operate. If there are publishers and subscribers for the
>> same queue on each partition messages will be duplicated and each slot
>> coordinator will deliver messages from overlapping slots on their own.
>>
>> My point here is, in a partition scenario if we have DB access from all
>> partitions, having multiple slot coordinators will be problematic. All this
>> options are assuming thrift is working without any issue. If thrift is not
>> working between partitions, then having a single slot coordinator will
>> starve subscribers in other partitions.
>>
>> So we have four communication links we need to look at
>>
>>    - Database link
>>    - Coordination link
>>    - Thrift link
>>    - Publisher subscriber link ( AMQP and MQTT ports)
>>
>> We need to analyze the impact of losing these links in any combination.
>> I may be totally or partially wrong on this.
>>
>> Thanks
>> Asitha
>>
>> On Mon, Jun 1, 2015 at 9:32 AM, Asanka Abeyweera <[email protected]>
>> wrote:
>>
>>> Hi,
>>>
>>> When the two partitions connect again, Can the cluster select a new slot
>>> manager node (other than the ones already present in two partitions)? We
>>> might also have to understand how the hazlecast lists and maps are merged
>>> internally in these scenarios to fully answer this.
>>>
>>> On Sun, May 31, 2015 at 8:08 PM, Ramith Jayasinghe <[email protected]>
>>> wrote:
>>>
>>>> well I'm not actually asking implement this.  BUT we absolutely have to
>>>> have a reconciliation model otherwise we are screwed.
>>>>
>>>>
>>>> On Sun, May 31, 2015 at 7:28 PM, Hasitha Hiranya <[email protected]>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> We need to merge all operating lists (fresh slots/assigned
>>>>> slots/overlapped slots/returned slots) in two slot managers together.
>>>>>
>>>>> If we met a conflict during merging (same slot assigned to different
>>>>> nodes), we should give a BIG warning, and maybe continue. At that point we
>>>>> cannot do anything from Slot Manager Side, individual nodes will be
>>>>> delivering same message.
>>>>>
>>>>> Otherwise we need to introduce some abortImmediately method - which is
>>>>> heck-tic.
>>>>>
>>>>> So, yeah, Ramith's proposal looks simple enough. When partitions are
>>>>> merged, allow big part to continue, and do not allow any new slot
>>>>> assignments to nodes which are not in the partition, rather put a BIG log,
>>>>> this node is useless and not in a cluster. Please restart.
>>>>>
>>>>> Thanks
>>>>>
>>>>> On Sun, May 31, 2015 at 7:45 AM, Pamod Sylvester <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> In this case we might need to sort messages which are laying in a
>>>>>> queue or a durable subscription ? For message ordering. I.e maintaining
>>>>>> time stamp etc
>>>>>>
>>>>>>
>>>>>> On Sunday, May 31, 2015, Ramith Jayasinghe <[email protected]> wrote:
>>>>>>
>>>>>>> suppose there are two network partitions:
>>>>>>>  P1,P2 where, nodecount(P1) >= nodecount(P2)
>>>>>>>
>>>>>>>  def: nodecount : - number of broker nodes in the partition.
>>>>>>>
>>>>>>>  so two brokers will operate own their own during the partition ( -
>>>>>>> with their own coordinator which is bad -> we need to find/observe 
>>>>>>> what's
>>>>>>> the exact behavior
>>>>>>>
>>>>>>>  1)how slots are being used ->
>>>>>>>  2) will this make stale messages in DB?
>>>>>>>  3) will there be duplicates ( which is ok at this point than
>>>>>>> loosing messages)
>>>>>>>
>>>>>>> and biggest problem we want to solve is what are we gong to do when
>>>>>>> partitions are merged?
>>>>>>> My proposal is:
>>>>>>>  Partition which has biggest node count ( max(nodecount(P1),
>>>>>>> nodecount(P2) ) continues to operate
>>>>>>> and all other nodes have to restart (by user) if nodecount(P2) > 2.
>>>>>>>
>>>>>>> thoughts?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Ramith Jayasinghe
>>>>>>> Technical Lead
>>>>>>> WSO2 Inc., http://wso2.com
>>>>>>> lean.enterprise.middleware
>>>>>>>
>>>>>>> E: [email protected]
>>>>>>> P: +94 777542851
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> *Pamod Sylvester *
>>>>>>
>>>>>> *WSO2 Inc.; http://wso2.com <http://wso2.com>*
>>>>>> cell: +94 77 7779495
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> *Hasitha Abeykoon*
>>>>> Senior Software Engineer; WSO2, Inc.; http://wso2.com
>>>>> *cell:* *+94 719363063*
>>>>> *blog: **abeykoon.blogspot.com* <http://abeykoon.blogspot.com>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Ramith Jayasinghe
>>>> Technical Lead
>>>> WSO2 Inc., http://wso2.com
>>>> lean.enterprise.middleware
>>>>
>>>> E: [email protected]
>>>> P: +94 777542851
>>>>
>>>>
>>>
>>>
>>> --
>>> Asanka Abeyweera
>>> Software Engineer
>>> WSO2 Inc.
>>>
>>> Phone: +94 712228648
>>> Blog: a5anka.github.io
>>>
>>
>>
>>
>> --
>> *Asitha Nanayakkara*
>> Software Engineer
>> WSO2, Inc. http://wso2.com/
>> Mob: + 94 77 85 30 682
>>
>>
>
>
> --
> Asanka Abeyweera
> Software Engineer
> WSO2 Inc.
>
> Phone: +94 712228648
> Blog: a5anka.github.io
>



-- 
*Asitha Nanayakkara*
Software Engineer
WSO2, Inc. http://wso2.com/
Mob: + 94 77 85 30 682

_______________________________________________
Dev mailing list
[email protected]
http://wso2.org/cgi-bin/mailman/listinfo/dev

Re: [Dev] about network partitions

Reply via email to