[ 
https://issues.apache.org/jira/browse/KUDU-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16899053#comment-16899053
 ] 

Adar Dembo commented on KUDU-2914:
----------------------------------

Agreed, but this bug description made it sound like an affected tserver is 
going to return (i.e. temporary "maintenance") vs. going away forever (i.e. 
permanent "decommissioning").

For decommissioning, I agree that moving all of the tserver's replicas to other 
tservers makes sense.

For maintenance, I don't think that makes sense. Rereplicating all replicas is 
really expensive, and once you've done that there's no interesting state on the 
shutdown tserver; you could just as easily reformat it. Instead, I think a 
temporary shutdown should put the affected tablets into a "degraded" mode where 
they don't rereplicate. Then when the affected tserver restarts, its replicas 
seamlessly rejoin the tablets' consensus groups and no rereplication is needed. 
If the affected tables must tolerate an additional fault during this time, they 
should be configured to use RF=5 rather than RF=3.

If you're interested in tackling decommissioning, please take a look at 
KUDU-1827.

> Rebalance tool support moving replicas from some specific tablet servers
> ------------------------------------------------------------------------
>
>                 Key: KUDU-2914
>                 URL: https://issues.apache.org/jira/browse/KUDU-2914
>             Project: Kudu
>          Issue Type: Improvement
>          Components: CLI
>            Reporter: YifanZhang
>            Priority: Minor
>
> When we need to stop/upgrade some tablet servers in a kudu cluster, these 
> tservers would become unavailable and tablets on these servers would be 
> unhealthy in a period of time. In order to ensure the high availability of 
> the cluster, it's better to move all replicas on these tservers to other 
> tservers in a cluster, then stop or upgrade tservers. This can be achieved by 
> rebalance tool to support specifying 'blacklist_tservers'.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to