[
https://issues.apache.org/jira/browse/KUDU-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16899053#comment-16899053
]
Adar Dembo commented on KUDU-2914:
----------------------------------
Agreed, but this bug description made it sound like an affected tserver is
going to return (i.e. temporary "maintenance") vs. going away forever (i.e.
permanent "decommissioning").
For decommissioning, I agree that moving all of the tserver's replicas to other
tservers makes sense.
For maintenance, I don't think that makes sense. Rereplicating all replicas is
really expensive, and once you've done that there's no interesting state on the
shutdown tserver; you could just as easily reformat it. Instead, I think a
temporary shutdown should put the affected tablets into a "degraded" mode where
they don't rereplicate. Then when the affected tserver restarts, its replicas
seamlessly rejoin the tablets' consensus groups and no rereplication is needed.
If the affected tables must tolerate an additional fault during this time, they
should be configured to use RF=5 rather than RF=3.
If you're interested in tackling decommissioning, please take a look at
KUDU-1827.
> Rebalance tool support moving replicas from some specific tablet servers
> ------------------------------------------------------------------------
>
> Key: KUDU-2914
> URL: https://issues.apache.org/jira/browse/KUDU-2914
> Project: Kudu
> Issue Type: Improvement
> Components: CLI
> Reporter: YifanZhang
> Priority: Minor
>
> When we need to stop/upgrade some tablet servers in a kudu cluster, these
> tservers would become unavailable and tablets on these servers would be
> unhealthy in a period of time. In order to ensure the high availability of
> the cluster, it's better to move all replicas on these tservers to other
> tservers in a cluster, then stop or upgrade tservers. This can be achieved by
> rebalance tool to support specifying 'blacklist_tservers'.
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)