[ 
https://issues.apache.org/jira/browse/HBASE-21266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16638611#comment-16638611
 ] 

Andrew Purtell edited comment on HBASE-21266 at 10/4/18 5:48 PM:
-----------------------------------------------------------------

bq.  "Number of dead servers in processing should always be non-negative"

You are looking at that assert in DeadServer#finish, right? Those aren't 
evaulated unless the JVM is started with the -ea command line flag, which I 
didn't do. 

We can see from the log line I did see that the dead server map was empty at 
the time so I agree we should look at accounting in DeadServer.java.

"Not running balancer because processing dead regionserver(s)" is printed from 
HMaster.java:1846 based on the result from 
ServerManager#areDeadServersInProgress, which passes through the result from 
DeadServer#areDeadServersInProgress, which is simply

{code}
  public synchronized boolean areDeadServersInProgress() { return processing; }
{code}

This boolean is cleared in DeadServer#finish when
{code}
if (numProcessing == 0) { processing = false; }
{code}

So the first question I have is why do we even need this boolean field? It can 
easily be derived cheaply from other state. In areDeadServersInProgress just 
return the result of {{!(numProcessing == 0)}}. 

That assert you observed should be replaced by use of Preconditions so we will 
get a RuntimeException that will get noticed. 


was (Author: apurtell):
bq.  "Number of dead servers in processing should always be non-negative"

You are looking at that assert in DeadServer#finish, right? Those aren't 
evaulated unless the JVM is started with the -ea command line flag, which I 
didn't do. 

We can see from the log line I did see that the dead server map was empty at 
the time so I agree we should look at accounting in DeadServer.java.

"Not running balancer because processing dead regionserver(s)" is printed from 
HMaster.java:1846 based on the result from 
ServerManager#areDeadServersInProgress, which passes through the result from 
DeadServer#areDeadServersInProgress, which is simply

{code}
  public synchronized boolean areDeadServersInProgress() { return processing; }
{code}

This boolean is cleared in DeadServer#finish when
{code}
if (numProcessing == 0) { processing = false; }
{code}

So the first question I have is why do we even need this boolean field? It can 
easily be derived cheaply from other state. In areDeadServersInProgress just 
return the result of {{numProcessing == 0}}. 

That assert you observed should be replaced by use of Preconditions so we will 
get a RuntimeException that will get noticed. 

> Not running balancer because processing dead regionservers, but empty dead rs 
> list
> ----------------------------------------------------------------------------------
>
>                 Key: HBASE-21266
>                 URL: https://issues.apache.org/jira/browse/HBASE-21266
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 1.4.8
>            Reporter: Andrew Purtell
>            Priority: Major
>             Fix For: 1.5.0, 1.4.9
>
>
> Found during ITBLL testing. AM in master gets into a state where manual 
> attempts from the shell to run the balancer always return false and this is 
> printed in the master log:
> 2018-10-03 19:17:14,892 DEBUG 
> [RpcServer.default.FPBQ.Fifo.handler=21,queue=0,port=8100] master.HMaster: 
> Not running balancer because processing dead regionserver(s): 
> Note the empty list. 
> This errant state did not recover without intervention by way of master 
> restart, but the test environment was chaotic so needs investigation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to