Sounds good, although I think ISPN-3140 should remain in its current scope and 
just address point 1 below.  We can create a separate JIRA for point 2, since I 
think even point 1 on its own is useful for some use cases (as you say, where 
data loss isn't a concern).

On 31 May 2013, at 17:40, Adrian Nistor <[email protected]> wrote:

> Yes, ISPN-1394 has a broader scope but the proposed solution for ISPN-3140 
> solves quite a lot of ISPN-1394 and it's not complex. We might not even need 
> ISPN-1394 soon unless somebody really wants to control data ownership down to 
> segment granularity. If we only want to batch joins/leaves and manually kick 
> out nodes with or without loosing their data then this proposal should be 
> enough. This solution should not prevent implementation of ISPN-1394 in 
> future and will not need to be removed/undone.
> 
> Here are the details:
> 
> 1. Add a JMX writable attribute (or operation?) to ClusterTopologyManager 
> (name it suppressRehashing?) that is false by default but should also be 
> configurable via API or xml. While this attribute is true the 
> ClusterTopologyManager queues all join/leave/exclude(see below) requests and 
> does not execute them on the spot as it would normally happen. The value of 
> this attribute is ignored on all nodes but the coordinator. When it is set 
> back to false all queued operations (except the ones that cancel eachother 
> out) are executed. The setter should be         synchronous so when setting 
> is back to false it does not return until the queue is empty and all 
> rehashing was processed. 
> 
> 2. We add a JMX operation excludeNodes(list of addresses) to 
> ClusterTopologyManager. Calling this method on any node but the coordinator 
> is no-op. This operation removes the node from the topology (almost as if it 
> left) and forces a rebalance. The node is still present in the current CH but 
> not in the pending CH. It's basically disowned by all its data which is now 
> being transferred to other (not excluded) nodes. At the end of the rebalance 
> the node is removed from topology for good and can be shut down without 
> loosing data. Note that if suppressRehashing==false operation 
> excludeNodes(..) just queues them for later removal. We can batch multiple 
> such exclusions and then re-activate the rehashing.
> 
> The parts that need to be implemented are written in italic above. Everything 
> else is already there.
> 
> excludeNodes is a way of achieving a soft shutdown and should be used only if 
> we care about preserving data int the extreme case where the nodes are the 
> last/single owners. We can just kill the node directly if we do not care 
> about its data. 
> 
> suppressRehashing is a way of achieving some kind of batching of topology 
> changes. This should speed up state transfer a lot because it avoids a lot of 
> pointless reshuffling of data segments when we have many successive 
> joiners/leavers.
> 
> So what happens if the current coordinator dies for whatever reason? The new 
> one will take control and will not have knowledge of the existing rehash 
> queue or the previous status of suppressRehashing attribute so it will just 
> get the current cache membership status from all members of current view and 
> proceed with the rehashing as usual. If the user does not want this he can 
> set a default value of true for suppressRehashing. The admin has to interact 
> now via JMX with the new coordinator. But that's not as bad as the 
> alternative where all the nodes are involved in this jmx scheme :) I think 
> having only the coordinator involved in this is a plus.
> 
> Manik, how does this fit for the full and partial shutdown?
> 
> Cheers
> Adi
> 
> 
> On 05/31/2013 04:20 PM, Manik Surtani wrote:
>> 
>> On 31 May 2013, at 13:52, Dan Berindei <[email protected]> wrote:
>> 
>>> If we only want to deal with full cluster shutdown, then I think stopping 
>>> all application requests, calling Cache.clear() on one node, and then 
>>> shutting down all the nodes should be simpler. On start, assuming no cache 
>>> store, the caches will start empty, so starting all the nodes at once and 
>>> only allowing application requests when they've all joined should also work 
>>> without extra work.
>>> 
>>> If we only want to stop a part of the cluster, suppressing rebalancing 
>>> would be better, because we wouldn't lose all the data. But we'd still lose 
>>> the keys whose owners are all among the nodes we want to stop. I've 
>>> discussed this with Adrian, and we think if we want to stop a part of the 
>>> cluster without losing data we need a JMX operation on the coordinator that 
>>> will "atomically" remove a set of nodes from the CH. After the operation 
>>> completes, the user will know it's safe to stop those nodes without losing 
>>> data.
>> 
>> I think the no-data-loss option is bigger scope, perhaps part of ISPN-1394.  
>> And that's not what I am asking about.
>> 
>>> When it comes to starting a part of the cluster, a "pause rebalancing" 
>>> option would probably be better - but again, on the coordinator, not on 
>>> each joining node. And clearly, if more than numOwner nodes leave while 
>>> rebalancing is suspended, data will be lost.
>> 
>> Yup.  This sort of option would only be used where data loss isn't an issue 
>> (such as a distributed cache).  Where data loss is an issue, we'd need more 
>> control - ISPN-1394.
>> 
>>> 
>>> Cheers
>>> Dan
>>> 
>>> 
>>> 
>>> On Fri, May 31, 2013 at 12:17 PM, Manik Surtani <[email protected]> wrote:
>>> Guys
>>> 
>>> We've discussed ISPN-3140 elsewhere before, I'm brining it to this forum 
>>> now.
>>> 
>>> https://issues.jboss.org/browse/ISPN-3140
>>> 
>>> Any thoughts/concerns?  Particularly looking to hear from Dan or Adrian 
>>> about viability, complexity, ease of implementation.
>>> 
>>> Thanks
>>> Manik
>>> --
>>> Manik Surtani
>>> [email protected]
>>> twitter.com/maniksurtani
>>> 
>>> Platform Architect, JBoss Data Grid
>>> http://red.ht/data-grid
>>> 
>>> 
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> [email protected]
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>> 
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> [email protected]
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> 
>> --
>> Manik Surtani
>> [email protected]
>> twitter.com/maniksurtani
>> 
>> Platform Architect, JBoss Data Grid
>> http://red.ht/data-grid
>> 
>> 
>> 
>> _______________________________________________
>> infinispan-dev mailing list
>> [email protected]
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> 
> _______________________________________________
> infinispan-dev mailing list
> [email protected]
> https://lists.jboss.org/mailman/listinfo/infinispan-dev

--
Manik Surtani
[email protected]
twitter.com/maniksurtani

Platform Architect, JBoss Data Grid
http://red.ht/data-grid

_______________________________________________
infinispan-dev mailing list
[email protected]
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Reply via email to