Sounds good, although I think ISPN-3140 should remain in its current scope and just address point 1 below. We can create a separate JIRA for point 2, since I think even point 1 on its own is useful for some use cases (as you say, where data loss isn't a concern).
On 31 May 2013, at 17:40, Adrian Nistor <[email protected]> wrote: > Yes, ISPN-1394 has a broader scope but the proposed solution for ISPN-3140 > solves quite a lot of ISPN-1394 and it's not complex. We might not even need > ISPN-1394 soon unless somebody really wants to control data ownership down to > segment granularity. If we only want to batch joins/leaves and manually kick > out nodes with or without loosing their data then this proposal should be > enough. This solution should not prevent implementation of ISPN-1394 in > future and will not need to be removed/undone. > > Here are the details: > > 1. Add a JMX writable attribute (or operation?) to ClusterTopologyManager > (name it suppressRehashing?) that is false by default but should also be > configurable via API or xml. While this attribute is true the > ClusterTopologyManager queues all join/leave/exclude(see below) requests and > does not execute them on the spot as it would normally happen. The value of > this attribute is ignored on all nodes but the coordinator. When it is set > back to false all queued operations (except the ones that cancel eachother > out) are executed. The setter should be synchronous so when setting > is back to false it does not return until the queue is empty and all > rehashing was processed. > > 2. We add a JMX operation excludeNodes(list of addresses) to > ClusterTopologyManager. Calling this method on any node but the coordinator > is no-op. This operation removes the node from the topology (almost as if it > left) and forces a rebalance. The node is still present in the current CH but > not in the pending CH. It's basically disowned by all its data which is now > being transferred to other (not excluded) nodes. At the end of the rebalance > the node is removed from topology for good and can be shut down without > loosing data. Note that if suppressRehashing==false operation > excludeNodes(..) just queues them for later removal. We can batch multiple > such exclusions and then re-activate the rehashing. > > The parts that need to be implemented are written in italic above. Everything > else is already there. > > excludeNodes is a way of achieving a soft shutdown and should be used only if > we care about preserving data int the extreme case where the nodes are the > last/single owners. We can just kill the node directly if we do not care > about its data. > > suppressRehashing is a way of achieving some kind of batching of topology > changes. This should speed up state transfer a lot because it avoids a lot of > pointless reshuffling of data segments when we have many successive > joiners/leavers. > > So what happens if the current coordinator dies for whatever reason? The new > one will take control and will not have knowledge of the existing rehash > queue or the previous status of suppressRehashing attribute so it will just > get the current cache membership status from all members of current view and > proceed with the rehashing as usual. If the user does not want this he can > set a default value of true for suppressRehashing. The admin has to interact > now via JMX with the new coordinator. But that's not as bad as the > alternative where all the nodes are involved in this jmx scheme :) I think > having only the coordinator involved in this is a plus. > > Manik, how does this fit for the full and partial shutdown? > > Cheers > Adi > > > On 05/31/2013 04:20 PM, Manik Surtani wrote: >> >> On 31 May 2013, at 13:52, Dan Berindei <[email protected]> wrote: >> >>> If we only want to deal with full cluster shutdown, then I think stopping >>> all application requests, calling Cache.clear() on one node, and then >>> shutting down all the nodes should be simpler. On start, assuming no cache >>> store, the caches will start empty, so starting all the nodes at once and >>> only allowing application requests when they've all joined should also work >>> without extra work. >>> >>> If we only want to stop a part of the cluster, suppressing rebalancing >>> would be better, because we wouldn't lose all the data. But we'd still lose >>> the keys whose owners are all among the nodes we want to stop. I've >>> discussed this with Adrian, and we think if we want to stop a part of the >>> cluster without losing data we need a JMX operation on the coordinator that >>> will "atomically" remove a set of nodes from the CH. After the operation >>> completes, the user will know it's safe to stop those nodes without losing >>> data. >> >> I think the no-data-loss option is bigger scope, perhaps part of ISPN-1394. >> And that's not what I am asking about. >> >>> When it comes to starting a part of the cluster, a "pause rebalancing" >>> option would probably be better - but again, on the coordinator, not on >>> each joining node. And clearly, if more than numOwner nodes leave while >>> rebalancing is suspended, data will be lost. >> >> Yup. This sort of option would only be used where data loss isn't an issue >> (such as a distributed cache). Where data loss is an issue, we'd need more >> control - ISPN-1394. >> >>> >>> Cheers >>> Dan >>> >>> >>> >>> On Fri, May 31, 2013 at 12:17 PM, Manik Surtani <[email protected]> wrote: >>> Guys >>> >>> We've discussed ISPN-3140 elsewhere before, I'm brining it to this forum >>> now. >>> >>> https://issues.jboss.org/browse/ISPN-3140 >>> >>> Any thoughts/concerns? Particularly looking to hear from Dan or Adrian >>> about viability, complexity, ease of implementation. >>> >>> Thanks >>> Manik >>> -- >>> Manik Surtani >>> [email protected] >>> twitter.com/maniksurtani >>> >>> Platform Architect, JBoss Data Grid >>> http://red.ht/data-grid >>> >>> >>> _______________________________________________ >>> infinispan-dev mailing list >>> [email protected] >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>> >>> _______________________________________________ >>> infinispan-dev mailing list >>> [email protected] >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> >> -- >> Manik Surtani >> [email protected] >> twitter.com/maniksurtani >> >> Platform Architect, JBoss Data Grid >> http://red.ht/data-grid >> >> >> >> _______________________________________________ >> infinispan-dev mailing list >> [email protected] >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > > _______________________________________________ > infinispan-dev mailing list > [email protected] > https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Manik Surtani [email protected] twitter.com/maniksurtani Platform Architect, JBoss Data Grid http://red.ht/data-grid
_______________________________________________ infinispan-dev mailing list [email protected] https://lists.jboss.org/mailman/listinfo/infinispan-dev
