[
https://issues.apache.org/jira/browse/HBASE-21266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16638611#comment-16638611
]
Andrew Purtell commented on HBASE-21266:
----------------------------------------
bq. "Number of dead servers in processing should always be non-negative"
You are looking at that assert in DeadServer#finish, right? Those aren't
evaulated unless the JVM is started with the -ea command line flag, which I
didn't do.
We can see from the log line I did see that the dead server map was empty at
the time so I agree we should look at accounting in DeadServer.java.
"Not running balancer because processing dead regionserver(s)" is printed from
HMaster.java:1846 based on the result from
ServerManager#areDeadServersInProgress, which passes through the result from
DeadServer#areDeadServersInProgress, which is simply
{code}
public synchronized boolean areDeadServersInProgress() { return processing; }
{code}
This boolean is cleared in DeadServer#finish when
{code}
if (numProcessing == 0) { processing = false; }
{code}
So the first question I have is why do we even need this boolean field? It can
easily be derived cheaply from other state. In areDeadServersInProgress just
return the result of {{numProcessing == 0}}.
That assert you observed should be replaced by use of Preconditions so we will
get a RuntimeException that will get noticed.
> Not running balancer because processing dead regionservers, but empty dead rs
> list
> ----------------------------------------------------------------------------------
>
> Key: HBASE-21266
> URL: https://issues.apache.org/jira/browse/HBASE-21266
> Project: HBase
> Issue Type: Bug
> Affects Versions: 1.4.8
> Reporter: Andrew Purtell
> Priority: Major
> Fix For: 1.5.0, 1.4.9
>
>
> Found during ITBLL testing. AM in master gets into a state where manual
> attempts from the shell to run the balancer always return false and this is
> printed in the master log:
> 2018-10-03 19:17:14,892 DEBUG
> [RpcServer.default.FPBQ.Fifo.handler=21,queue=0,port=8100] master.HMaster:
> Not running balancer because processing dead regionserver(s):
> Note the empty list.
> This errant state did not recover without intervention by way of master
> restart, but the test environment was chaotic so needs investigation.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)