[
https://issues.apache.org/jira/browse/GEODE-4250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16331263#comment-16331263
]
Kirk Lund commented on GEODE-4250:
----------------------------------
This should probably be exposed to Users in two places:
* ResourceManager API
* GFSH command(s)
Would it be valuable to provide some sort of automated redundancy recovery? We
did NOT provide that for rebalancing because it was such an expensive
operation. But for redundancy recovery, it might be acceptable.
> Users would like a command to re-establish redundancy without rebalancing
> -------------------------------------------------------------------------
>
> Key: GEODE-4250
> URL: https://issues.apache.org/jira/browse/GEODE-4250
> Project: Geode
> Issue Type: Improvement
> Components: docs, regions
> Reporter: Fred Krone
> Priority: Major
>
> Command would only succeed when the system is fully redundant.
> Re-establishing Redundancy after the loss of a peer node is typically far
> more urgent and important than achieving better balance. The operational
> impact of rebalancing is also much higher, forcing impacted buckets' updates
> to be distributed to _redundancy-copies + 1_ peer processes and potentially
> spiking p2p connections/threads (and thus load) far beyond normal operations.
> If the system is already close to exhausting available capacity for some
> hardware component, this can be enough to push it over-the-edge (and may
> force the original fault to recur). This problem is exacerbated when the
> cluster's overall capacity has been reduced due to the loss of a physical
> server. Without the ability to separate the operational tasks of
> re-establishing full data redundancy and rebalancing bucket partitions (that
> are already safely redundant), system administrators may be forced to
> provision replacement capacity _before_ they can restore full service, thus
> increasing downtime unnecessarily.
> For these reasons, we must add the option to execute these operational tasks
> separately.
> It still makes sense for _rebalancing_ ops to first re-establish redundancy,
> so we can keep the existing GFSH command/behavior (it would still be useful
> to clearly log completion of one step before the next one begins). We need a
> new GFSH command/ResourceManager API to execute re-establishment of
> redundancy _without_ rebalancing.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)