[jira] [Comment Edited] (KUDU-3346) Rebalance fails when trying to decommission tserver on a rack-aware cluster

Georgiana Ogrean (Jira) Thu, 23 Dec 2021 17:58:07 -0800


    [ 
https://issues.apache.org/jira/browse/KUDU-3346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17464857#comment-17464857
 ]


Georgiana Ogrean edited comment on KUDU-3346 at 12/24/21, 1:57 AM:
-------------------------------------------------------------------

In case it helps with getting to the bottom of this:

After noticing that some logs appear twice for tservers in us-east-1c, e.g.
{code:java}
I1223 13:52:53.569551 11613 rebalancer.cc:305] found tserver 
ca2b022920654fd2aacd320adfe39148 at location '/us-east-1/us-east-1c'{code}
I tried placing in maintenance a tserver in that region and then running 
rebalance with the same flags. It fails with the same error as above, but while 
for the other two regions in our cluster all it printed before failing was the 
*Locations load summary* table, when ignoring a tserver in us-east-1c it also 
prints the *replica distribution summary* tables for that region (both 
per-server and per-table). I attached the rebalance log file when the job is 
run with a tserver in us-east-1c ignored after being put in maintenance; notice 
the duplicate log messages towards the end.

[^rebalance_ignored_tserver_1c.log.Z] 

 


was (Author: JIRAUSER282556):
In case it helps with getting to the bottom of this:

After noticing that some logs appear twice for tservers in us-east-1c, e.g.
{code:java}
I1223 13:52:53.569551 11613 rebalancer.cc:305] found tserver 
ca2b022920654fd2aacd320adfe39148 at location '/us-east-1/us-east-1c'{code}
I tried placing in maintenance a tserver in that region and then running 
rebalance with the same flags. It fails with the same error as above, but while 
for the other two regions in our cluster all it printed before failing was the 
*Locations load summary* table, when ignoring a tserver in us-east-1c it also 
prints the *replica distribution summary* tables for that region (both 
per-server and per-table). I attached the rebalance log file when the job is 
run with a tserver in us-east-1c ignored after being put in maintenance.

[^rebalance_ignored_tserver_1c.log.Z] 

 

> Rebalance fails when trying to decommission tserver on a rack-aware cluster
> ---------------------------------------------------------------------------
>
>                 Key: KUDU-3346
>                 URL: https://issues.apache.org/jira/browse/KUDU-3346
>             Project: Kudu
>          Issue Type: Bug
>    Affects Versions: 1.15.0
>            Reporter: Georgiana Ogrean
>            Priority: Major
>         Attachments: rebalance_ignored_tserver_1c.log.Z, rebalance_v1.log.Z
>
>
> When following the steps [in the 
> docs|https://docs.cloudera.com/runtime/7.2.0/administering-kudu/topics/kudu-decommissioning-or-permanently-removing-tablet-server-from-cluster.html]
>  for decommissioning a tserver, the rebalance job fails with:
> {code:java}
> Invalid argument: ignored tserver <tserver_uuid> is not reported among know 
> tservers 
> {code}
> Steps followed:
> 1. Checked that ksck passes.
> 2. Put the tserver to be decommissioned in maintenance mode.
> {code:java}
> sudo -u kudu kudu tserver state enter_maintenance $MASTER_ADDRESSES 
> 5ae499b1b870419daabb0e8da90ef233 {code}
> 3. Ran rebalance with {{-ignored_tservers}} and 
> {{-move_replicas_from_ignored_tservers}} flags.
> {code:java}
> sudo -u kudu kudu cluster rebalance $MASTER_ADDRESSES 
> -move_replicas_from_ignored_tservers 
> -ignored_tservers=5ae499b1b870419daabb0e8da90ef233 -v=1{code}
> The logs for the rebalace command are attached.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Comment Edited] (KUDU-3346) Rebalance fails when trying to decommission tserver on a rack-aware cluster

Reply via email to