[
https://issues.apache.org/jira/browse/KUDU-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16942577#comment-16942577
]
Alexey Serbin commented on KUDU-2914:
-------------------------------------
Thank you [~zhangyifan27] for your work on this useful feature!
In theory, once the tserver is marked with special {{decommissioned}} flag,
master won't put any new replicas at the tablet server. So, once the list of
replicas collected from the tablet server already in {{decommissioned}} mode,
it's safe to get the list of tablet replicas on the tserver and mark each with
the {{REPLACE}} attribute. After that it's necessary to get the list of tablet
replicas at the tablet server (e.g., using {{ListTablets()}} RPC as in
{{RemoteKsckTabletServer::FetchInfo()}}) from time to time and wait until no
replicas left there.
Yes, it's possible to mark any number of replicas at a tablet server with the
{{REPLACE}} attribute.
It should be possible do decommission multiple tablet servers at once, yes.
However, marking a server as {{decommissined}} isn't yet implemented. However,
as a temporary workaround, I think it's possible to put tablet servers into the
maintenance mode (see
[KUDU-2069|https://issues.apache.org/jira/browse/KUDU-2069]) instead of marking
them {{decommissioned}}. When a tablet server is put into the maintenance
mode, master doesn't place any replicas on it, but it's still possible to move
tablet replicas from it. See [this
commit|https://github.com/apache/kudu/commit/5316a89dfd13c36eef078b32043f161e6d0bbf01]
for details.
So, before proper decommissioning is implemented, the procedure of moving all
replicas from a tablet server could be the following:
# Put the tablet server into the maintenance mode.
# Mark all the replicas at the tablet server with {{REPLACE}} attribute
# Periodically retrieve the list of tablet replicas at the server.
# Once all the replicas are gone from the server, shut it down.
# Switch the tablet server off from the maintenance into the normal mode.
# Remove the tablet server from the cluster.
# Declare victory :)
> Rebalance tool support moving replicas from some specific tablet servers
> ------------------------------------------------------------------------
>
> Key: KUDU-2914
> URL: https://issues.apache.org/jira/browse/KUDU-2914
> Project: Kudu
> Issue Type: Improvement
> Components: CLI
> Reporter: YifanZhang
> Assignee: YifanZhang
> Priority: Minor
>
> When we need to remove some tservers from a kudu cluster (maybe just for
> saving resources or replacing these servers with new servers), it's better to
> move all replicas on these tservers to other tservers in a cluster in
> advance, instead of waiting for all replicas kicked out and evicting new
> replicas. This can be achieved by rebalance tool supporting specifying
> 'blacklist_tservers'.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)