Hi Erik This makes a lot of sense. In fact, I was really close to implementing it while I was replacing RebalancePolicy with AvailabilityStrategy. Unfortunately I hit some problems and I had to postpone it (mostly because I was also trying to make the flag per-cache).
The only question is what happens after a merge, if one partition has rebalancing enabled, and the other has rebalancing disabled. I think I would prefer to keep it disabled if at least one partition had it disabled. E.g. if you start a new node and it doesn't join properly, you wouldn't want it to trigger a rebalance when it finally finds the cluster, only after you enable rebalancing yourself. Cheers Dan On Tue, Oct 28, 2014 at 12:00 AM, Erik Salter <[email protected]> wrote: > Hi all, > > This topic came up in a separate discussion with Mircea, and he suggested > I post something on the mailing list for a wider audience. > > I have a business case where I need the value of the rebalancing flag read > by the joining nodes. Let's say we have a TACH where we want our keys > striped across machines, racks, etc. Due to how NBST works, if we start a > bunch of nodes on one side of the topology marker, we'rewill end up with > the case where all keys will dog-pile on the first node that joins before > being disseminated to the other nodes. In other words, the first joining > node on the other side of the topology acts as a "pivot." That's bad, > especially if the key is marked as DELTA_WRITE, where the receiving node > must pull the key from the readCH before applying the changelog. > > So not only do we have a single choke-point, but it's made worse by the > initial burst of every write requiring numOwner threads for remote reads. > > If we disable rebalancing and start up the nodes on the other side of the > topology, we can process this in a single view change. But there's a > catch -- and this is the reason I added the state of the flag. We've run > into a case where the current coordinator changed (crash or a MERGE) as > the other nodes are starting up. And the new coordinator was elected from > the new side of the topology. So we had two separate but balanced CHs on > both sides of the topology. And data integrity went out the window. > > Hence the flag. Note also that this deployment requires the > awaitInitialTransfer flag to be false. > > In a real production environment, this has saved me more times than I can > count. Node failover/failback is now reasonably deterministic with a > simple operational procedure for our customer(s) to follow. > > > The question is whether this feature would be useful for the community. > Even with the new partition handling, I think this implementation is still > viable and may warrant inclusion into 7.0 (or 7.1). What does the team > think? I welcome any and all feedback. > > Regards, > > Erik Salter > Cisco Systems, SPVTG > (404) 317-0693 > > > _______________________________________________ > infinispan-dev mailing list > [email protected] > https://lists.jboss.org/mailman/listinfo/infinispan-dev >
_______________________________________________ infinispan-dev mailing list [email protected] https://lists.jboss.org/mailman/listinfo/infinispan-dev
