[
https://issues.apache.org/jira/browse/KUDU-3346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17464857#comment-17464857
]
Georgiana Ogrean edited comment on KUDU-3346 at 12/24/21, 1:57 AM:
-------------------------------------------------------------------
In case it helps with getting to the bottom of this:
After noticing that some logs appear twice for tservers in us-east-1c, e.g.
{code:java}
I1223 13:52:53.569551 11613 rebalancer.cc:305] found tserver
ca2b022920654fd2aacd320adfe39148 at location '/us-east-1/us-east-1c'{code}
I tried placing in maintenance a tserver in that region and then running
rebalance with the same flags. It fails with the same error as above, but while
for the other two regions in our cluster all it printed before failing was the
*Locations load summary* table, when ignoring a tserver in us-east-1c it also
prints the *replica distribution summary* tables for that region (both
per-server and per-table). I attached the rebalance log file when the job is
run with a tserver in us-east-1c ignored after being put in maintenance; notice
the duplicate log messages towards the end.
[^rebalance_ignored_tserver_1c.log.Z]
was (Author: JIRAUSER282556):
In case it helps with getting to the bottom of this:
After noticing that some logs appear twice for tservers in us-east-1c, e.g.
{code:java}
I1223 13:52:53.569551 11613 rebalancer.cc:305] found tserver
ca2b022920654fd2aacd320adfe39148 at location '/us-east-1/us-east-1c'{code}
I tried placing in maintenance a tserver in that region and then running
rebalance with the same flags. It fails with the same error as above, but while
for the other two regions in our cluster all it printed before failing was the
*Locations load summary* table, when ignoring a tserver in us-east-1c it also
prints the *replica distribution summary* tables for that region (both
per-server and per-table). I attached the rebalance log file when the job is
run with a tserver in us-east-1c ignored after being put in maintenance.
[^rebalance_ignored_tserver_1c.log.Z]
> Rebalance fails when trying to decommission tserver on a rack-aware cluster
> ---------------------------------------------------------------------------
>
> Key: KUDU-3346
> URL: https://issues.apache.org/jira/browse/KUDU-3346
> Project: Kudu
> Issue Type: Bug
> Affects Versions: 1.15.0
> Reporter: Georgiana Ogrean
> Priority: Major
> Attachments: rebalance_ignored_tserver_1c.log.Z, rebalance_v1.log.Z
>
>
> When following the steps [in the
> docs|https://docs.cloudera.com/runtime/7.2.0/administering-kudu/topics/kudu-decommissioning-or-permanently-removing-tablet-server-from-cluster.html]
> for decommissioning a tserver, the rebalance job fails with:
> {code:java}
> Invalid argument: ignored tserver <tserver_uuid> is not reported among know
> tservers
> {code}
> Steps followed:
> 1. Checked that ksck passes.
> 2. Put the tserver to be decommissioned in maintenance mode.
> {code:java}
> sudo -u kudu kudu tserver state enter_maintenance $MASTER_ADDRESSES
> 5ae499b1b870419daabb0e8da90ef233 {code}
> 3. Ran rebalance with {{-ignored_tservers}} and
> {{-move_replicas_from_ignored_tservers}} flags.
> {code:java}
> sudo -u kudu kudu cluster rebalance $MASTER_ADDRESSES
> -move_replicas_from_ignored_tservers
> -ignored_tservers=5ae499b1b870419daabb0e8da90ef233 -v=1{code}
> The logs for the rebalace command are attached.
>
--
This message was sent by Atlassian Jira
(v8.20.1#820001)