[ 
https://issues.apache.org/jira/browse/KUDU-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16905615#comment-16905615
 ] 

Andrew Wong commented on KUDU-2914:
-----------------------------------

I put together a doc summarizing my thoughts on this and a few other features: 
[https://docs.google.com/document/d/12BZqspGjHvQlc-o8XTDixoRol9Q36WJzXLJ6p15Zhf0],
 and I discussed this a bit on this Gerrit patch: 
[https://gerrit.cloudera.org/c/14048/].

I think the end-to-end process of decommissioning boils down to three pieces:
 # Mark a tablet server as being decommissioning to avoid replica placement 
onto that tablet server. I think this is KUDU-1827. I'm working on something 
similar right now (maintenance mode).
 # Drain all replicas away from the tablet server. I think that is this ticket, 
KUDU-2914.
 # Once empty, (either automatically or with a tool) indicate that the tablet 
server has been successfully decommissioned, removing the tserver from the 
master's in-memory set of tservers. I think this is KUDU-2915.

Ideally, all of this functionality would be baked into a single tool, but given 
the spread of tasks, maybe it's better to keep the work separate. There has 
been precedence for introducing tools whose functionality gets subsumed by 
another tool. `kudu fs list` comes to mind as a tool that is redundant with 
some of the `kudu local_replica` tooling. While redundant tooling may be 
confusing for operators, maybe good documentation can make it less confusing.

For instance, I can imagine #2 and #3 being introduced as separate tools, and 
then once all 3 are complete, wrap the entire process in a decommissioning 
tool. I don't have a strong opinion either way.

> Rebalance tool support moving replicas from some specific tablet servers
> ------------------------------------------------------------------------
>
>                 Key: KUDU-2914
>                 URL: https://issues.apache.org/jira/browse/KUDU-2914
>             Project: Kudu
>          Issue Type: Improvement
>          Components: CLI
>            Reporter: YifanZhang
>            Priority: Minor
>
> When we need to remove some tservers from a kudu cluster (maybe just for 
> saving resources or replacing these servers with new servers), it's better to 
> move all replicas on these tservers to other tservers in a cluster in 
> advance, instead of waiting for all replicas kicked out and evicting new 
> replicas. This can be achieved by rebalance tool supporting specifying 
> 'blacklist_tservers'.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to