[
https://issues.apache.org/jira/browse/KUDU-2548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrew Wong resolved KUDU-2548.
-------------------------------
Fix Version/s: 1.10.0
Resolution: Fixed
Users can run the rebalancer with permanently dead tservers by running with the
`–ignore_tservers` option.
> Rebalancer tool should be able to run even if there are permanently dead
> tablet servers
> ---------------------------------------------------------------------------------------
>
> Key: KUDU-2548
> URL: https://issues.apache.org/jira/browse/KUDU-2548
> Project: Kudu
> Issue Type: Improvement
> Affects Versions: 1.7.1
> Reporter: William Berkeley
> Assignee: William Berkeley
> Priority: Major
> Fix For: 1.10.0
>
>
> The rebalancer will bail as soon as it sees a down tablet server, including
> at the beginning before it does rebalancing. There's a few reasons for this:
> 1. Rebalancing shouldn't fight with re-replication. If a tablet server is
> down for a while, all its replicas will need to be re-replicated. Since
> rebalancing is greedy and can be interrupted or resumed anytime, it's better
> to exit, allow re-replication to occur, and then resume rebalancing.
> 2. It's more complicated to figure out how to balance correctly with a greedy
> algorithm if tablet servers can come and go, since coming and going changes
> the balance state of the cluster. We allow TS to join the cluster and will
> begin to move replicas there, but if we allow TS to go down we ought to think
> about handling if they come back. It's easier to leave solving this problem
> for when rebalancing and re-replication are somewhat unified in the master.
> Nevertheless, it's a bummer that if, e.g., a user decom'd a tserver 3 months
> ago, the rebalancer won't run because the rebalancer's ksck says a tserver is
> unavailable. We can fix this very cleanly once proper decommissioning has
> been implemented- with a distinction between "gone missing" and
> "decommissioned", we can have the RB tool (really ksck) ignore decom'd
> servers.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)