[
https://issues.apache.org/jira/browse/KUDU-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16905615#comment-16905615
]
Andrew Wong commented on KUDU-2914:
-----------------------------------
I put together a doc summarizing my thoughts on this and a few other features:
[https://docs.google.com/document/d/12BZqspGjHvQlc-o8XTDixoRol9Q36WJzXLJ6p15Zhf0],
and I discussed this a bit on this Gerrit patch:
[https://gerrit.cloudera.org/c/14048/].
I think the end-to-end process of decommissioning boils down to three pieces:
# Mark a tablet server as being decommissioning to avoid replica placement
onto that tablet server. I think this is KUDU-1827. I'm working on something
similar right now (maintenance mode).
# Drain all replicas away from the tablet server. I think that is this ticket,
KUDU-2914.
# Once empty, (either automatically or with a tool) indicate that the tablet
server has been successfully decommissioned, removing the tserver from the
master's in-memory set of tservers. I think this is KUDU-2915.
Ideally, all of this functionality would be baked into a single tool, but given
the spread of tasks, maybe it's better to keep the work separate. There has
been precedence for introducing tools whose functionality gets subsumed by
another tool. `kudu fs list` comes to mind as a tool that is redundant with
some of the `kudu local_replica` tooling. While redundant tooling may be
confusing for operators, maybe good documentation can make it less confusing.
For instance, I can imagine #2 and #3 being introduced as separate tools, and
then once all 3 are complete, wrap the entire process in a decommissioning
tool. I don't have a strong opinion either way.
> Rebalance tool support moving replicas from some specific tablet servers
> ------------------------------------------------------------------------
>
> Key: KUDU-2914
> URL: https://issues.apache.org/jira/browse/KUDU-2914
> Project: Kudu
> Issue Type: Improvement
> Components: CLI
> Reporter: YifanZhang
> Priority: Minor
>
> When we need to remove some tservers from a kudu cluster (maybe just for
> saving resources or replacing these servers with new servers), it's better to
> move all replicas on these tservers to other tservers in a cluster in
> advance, instead of waiting for all replicas kicked out and evicting new
> replicas. This can be achieved by rebalance tool supporting specifying
> 'blacklist_tservers'.
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)