On 17 Apr 2013, at 07:24, Bela Ban <[email protected]> wrote: > If we go with a primary partition approach, then only the primary > partition will be allowed to make progress (a.k.a. accept changes), we > therefore won't have any conflicts. > > The partition approach must be chosen so there can only be 1 primary > partition max, and the minority partitions shut down or turn read-only. > When merging, minority partitions need to get the state from the primary > partition, so state transfer on a merge always needs to flow from the > primary partition to the minority partition(s).
Correct. This was the 'special behaviour' that I was asking for, to check whether this state transfer from primary partition to secondary partitions happen during a merge, or whether the minority partition nodes are just wiped and treated as fresh joiners. > > I don't know how this could be done, but perhaps an approach would be to > treat members of minority partitions on a merge as if they were fresh > joiners ? > > On 4/17/13 10:31 AM, Adrian Nistor wrote: >> In case of MergeView the cluster topology manager running on (the new) >> coordinator will request the current cache topology from all members and >> will compute a new topology as the union of all. The new topology id is >> computed as the max + 2 of the existing topology ids. Any currently >> pending rebalance in any subpartition is ended now and a new rebalance >> is triggered for the new cluster. No data version conflict resolution is >> performed => chaos :) >> >> On 04/16/2013 10:05 PM, Manik Surtani wrote: >>> Guys - I've started documenting this here [1] and will put together a >>> prototype this week. >>> >>> One question though, perhaps one for Dan/Adrian - is there any special >>> handling for state transfer if a MergeView is detected? >>> >>> - M >>> >>> [1] https://community.jboss.org/wiki/DesignDealingWithNetworkPartitions >>> >>> On 6 Apr 2013, at 04:26, Bela Ban <[email protected]> wrote: >>> >>>> >>>> On 4/5/13 3:53 PM, Manik Surtani wrote: >>>>> Guys, >>>>> >>>>> So this is what I have in mind for this, looking for opinions. >>>>> >>>>> 1. We write a SplitBrainListener which is registered when the >>>>> channel connects. The aim of this listener is to identify when we >>>>> have a partition. This can be identified when a view change is >>>>> detected, and the new view is significantly smaller than the old >>>>> view. Easier to detect for large clusters, but smaller clusters will >>>>> be harder - trying to decide between a node leaving vs a partition. >>>>> (Any better ideas here?) >>>>> >>>>> 2. The SBL flips a switch in an interceptor >>>>> (SplitBrainHandlerInterceptor?) which switches the node to be >>>>> read-only (reject invocations that change the state of the local >>>>> node) if it is in the smaller partition (newView.size < oldView.size >>>>> / 2). Only works reliably for odd-numbered cluster sizes, and the >>>>> issues with small clusters seen in (1) will affect here as well. >>>>> >>>>> 3. The SBL can flip the switch in the interceptor back to normal >>>>> operation once a MergeView is detected. >>>>> >>>>> It's no way near perfect but at least it means that we can recommend >>>>> enabling this and setting up an odd number of nodes, with a cluster >>>>> size of at least N if you want to reduce inconsistency in your grid >>>>> during partitions. >>>>> >>>>> Is this even useful? >>>> >>>> So I assume this is to shut down (or make read-only) non primary >>>> partitions. I'd go with an approach similar to [1] section 5.6.2, which >>>> makes a partition read-only once it drops below a certain number of nodes >>>> N. >>>> >>>> >>>>> Bela, is there a more reliable mechanism to detect a split in (1)? >>>> I'm afraid no. We never know whether a large number of members being >>>> removed from the view means that they left, or that we have a partition, >>>> e.g. because a switch crashed. >>>> >>>> One thing you could do though is for members who are about to leave >>>> regularly to broadcast a LEAVE messages, so that when the view is >>>> received, the SBL knows those members, and might be able to determine >>>> better whether we have a partition, or not. >>>> >>>> [1] http://www.jgroups.org/manual-3.x/html/user-advanced.html, section >>>> 5.6.2 >>>> >>>> -- >>>> Bela Ban, JGroups lead (http://www.jgroups.org) >>>> _______________________________________________ >>>> infinispan-dev mailing list >>>> [email protected] >>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>> -- >>> Manik Surtani >>> [email protected] >>> twitter.com/maniksurtani >>> >>> Platform Architect, JBoss Data Grid >>> http://red.ht/data-grid >>> >>> >>> _______________________________________________ >>> infinispan-dev mailing list >>> [email protected] >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> >> _______________________________________________ >> infinispan-dev mailing list >> [email protected] >> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> > > -- > Bela Ban, JGroups lead (http://www.jgroups.org) > _______________________________________________ > infinispan-dev mailing list > [email protected] > https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Manik Surtani [email protected] twitter.com/maniksurtani Platform Architect, JBoss Data Grid http://red.ht/data-grid _______________________________________________ infinispan-dev mailing list [email protected] https://lists.jboss.org/mailman/listinfo/infinispan-dev
