Re: [infinispan-dev] Suppressing state transfer via JMX

Manik Surtani Wed, 12 Jun 2013 04:58:32 -0700

On 11 Jun 2013, at 16:27, Dan Berindei <[email protected]> wrote:


> 
> 
> 
> On Tue, Jun 11, 2013 at 2:01 PM, Manik Surtani <[email protected]> wrote:
> 
> On 10 Jun 2013, at 15:12, Dan Berindei <[email protected]> wrote:
> 
>> Erik, I think in your case you'd be better served by a ConsistentHashFactory 
>> that always assigns at most one owner from each machine for each segment. 
>> 
>> I guess the fix for ISPN-3140 should work as well, but it wouldn't be very 
>> straightforward: you'd have to keep the rebalancingEnabled attribute set to 
>> false by default, and you'd have to enable it temporarily every time you 
>> have a topology change that you do want to process.
> 
> Why?  Does the workflow detailed in ISPN-3140 not work?
> 
> 
> ISPN-3140 is geared toward planned shutdowns, my understanding was that 
> Erik's scenario involves an unexpected failure.
> 
> Say we have a cluster with 4 nodes spread on 2 machines: A(m1), B(m1), C(m2), 
> D(m2).
> If m2 fails, rebalancing will start automatically and m1 will have 2 copies 
> of each entry (one on A and one on B).
> Trying to suspend rebalancing after m2 has already failed won't have any 
> effect - if state transfer is already in progress it won't be cancelled.
> In order to avoid the unnecessary transfers, rebalancing would have to be 
> suspended before the failure - i.e. rebalancing should be suspended by 
> default.
>  
>> It's certainly possible to do this automatically from your app or from a 
>> monitoring daemon, but I'm pretty sure an enhanced topology-aware CHF would 
>> be a better fit.
> 
> Do explain.
> 
> 
> A custom ConsistentHashFactory could distribute segments so that a machine 
> never has more than 1 copy of each segment. If m2 failed, there would be just 
> one machine in the cluster, and just one copy of each segment. The factory 
> would not change the consistent hash, and there wouldn't be any state 
> transfer.

But that's bad for unplanned failures, as you lose data in that case.

> 
> It could be even simpler - the existing 
> TopologyAwareConsistentHashFactory/TopologyAwareSyncConsistentHashFactory 
> implementations already ensure just one copy per machine if the number of 
> machines is >= numOwners. So a custom ConsistentHashFactory could just extend 
> one of these and skip calling super.rebalance() when the number of machines 
> in the cluster is < numOwners.
>  
> 
> 
>> 
>> 
>> 
>> On Fri, Jun 7, 2013 at 1:45 PM, Erik Salter <[email protected]> wrote:
>> I'd like something similar.  If I have equal keys on two machines (given an
>> orthogonal setup and a TACH), I'd like to suppress state transfer and run
>> with only one copy until I can recover my machines.  The business case is
>> that in a degraded scenario, additional replicas aren't going to buy me
>> anything, as a failure will most likely be at the machine level and will
>> cause me to lose data.  Once I've recovered the other machine, I can turn
>> back on state transfer to get my data redundancy.
>> 
>> Erik
>> 
>> -----Original Message-----
>> From: [email protected]
>> [mailto:[email protected]] On Behalf Of Mircea Markus
>> Sent: Tuesday, June 04, 2013 5:44 AM
>> To: infinispan -Dev List
>> Subject: Re: [infinispan-dev] Suppressing state transfer via JMX
>> 
>> Manik, what's wrong with Dan's suggestion with clearing the cache before
>> shutdown?
>> 
>> On 31 May 2013, at 14:20, Manik Surtani <[email protected]> wrote:
>> 
>> >>
>> >> If we only want to deal with full cluster shutdown, then I think stopping
>> all application requests, calling Cache.clear() on one node, and then
>> shutting down all the nodes should be simpler. On start, assuming no cache
>> store, the caches will start empty, so starting all the nodes at once and
>> only allowing application requests when they've all joined should also work
>> without extra work.
>> >>
>> >> If we only want to stop a part of the cluster, suppressing rebalancing
>> would be better, because we wouldn't lose all the data. But we'd still lose
>> the keys whose owners are all among the nodes we want to stop. I've
>> discussed this with Adrian, and we think if we want to stop a part of the
>> cluster without losing data we need a JMX operation on the coordinator that
>> will "atomically" remove a set of nodes from the CH. After the operation
>> completes, the user will know it's safe to stop those nodes without losing
>> data.
>> >
>> > I think the no-data-loss option is bigger scope, perhaps part of
>> ISPN-1394.  And that's not what I am asking about.
>> >
>> >> When it comes to starting a part of the cluster, a "pause rebalancing"
>> option would probably be better - but again, on the coordinator, not on each
>> joining node. And clearly, if more than numOwner nodes leave while
>> rebalancing is suspended, data will be lost.
>> >
>> > Yup.  This sort of option would only be used where data loss isn't an
>> issue (such as a distributed cache).  Where data loss is an issue, we'd need
>> more control - ISPN-1394.
>> >
>> 
>> Cheers,
>> --
>> Mircea Markus
>> Infinispan lead (www.infinispan.org)
>> 
>> 
>> 
>> 
>> 
>> _______________________________________________
>> infinispan-dev mailing list
>> [email protected]
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> 
>> _______________________________________________
>> infinispan-dev mailing list
>> [email protected]
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> 
>> _______________________________________________
>> infinispan-dev mailing list
>> [email protected]
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> 
> --
> Manik Surtani
> [email protected]
> twitter.com/maniksurtani
> 
> Platform Architect, JBoss Data Grid
> http://red.ht/data-grid
> 
> 
> _______________________________________________
> infinispan-dev mailing list
> [email protected]
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> 
> _______________________________________________
> infinispan-dev mailing list
> [email protected]
> https://lists.jboss.org/mailman/listinfo/infinispan-dev

--
Manik Surtani
[email protected]
twitter.com/maniksurtani

Platform Architect, JBoss Data Grid
http://red.ht/data-grid

_______________________________________________
infinispan-dev mailing list
[email protected]
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] Suppressing state transfer via JMX

Reply via email to