[ 
https://issues.apache.org/jira/browse/HBASE-21266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16645461#comment-16645461
 ] 

Andrew Purtell commented on HBASE-21266:
----------------------------------------

With latest patch this issue cannot be reproduced and the AM is stable in ITBLL 
testing 500M rows with serverKilling chaos policy, which completes 
successfully. Verified with added debug logging in DeadServer, periodic hbck 
invocation (cluster always returned to a 0 inconsistencies detected state), and 
periodic balancer invocation, and the unit test suite. 

We no longer rely on an integer counter and boolean to track the processing 
status of dead servers. Instead DeadServer uses a Set from which expected state 
checks are derived, logging is improved, and there is a new runtime visible 
assert for incorrect API usage (which doesn't assert in any testing). 

> Not running balancer because processing dead regionservers, but empty dead rs 
> list
> ----------------------------------------------------------------------------------
>
>                 Key: HBASE-21266
>                 URL: https://issues.apache.org/jira/browse/HBASE-21266
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 1.4.8
>            Reporter: Andrew Purtell
>            Assignee: Andrew Purtell
>            Priority: Major
>             Fix For: 1.5.0, 1.4.9
>
>         Attachments: HBASE-21266-branch-1.patch, HBASE-21266-branch-1.patch, 
> HBASE-21266-branch-1.patch, HBASE-21266-branch-1.patch, 
> HBASE-21266-branch-1.patch, HBASE-21266-branch-1.patch, 
> HBASE-21266-branch-1.patch, HBASE-21266-branch-1.patch
>
>
> Found during ITBLL testing. AM in master gets into a state where manual 
> attempts from the shell to run the balancer always return false and this is 
> printed in the master log:
> 2018-10-03 19:17:14,892 DEBUG 
> [RpcServer.default.FPBQ.Fifo.handler=21,queue=0,port=8100] master.HMaster: 
> Not running balancer because processing dead regionserver(s): 
> Note the empty list. 
> This errant state did not recover without intervention by way of master 
> restart, but the test environment was chaotic so needs investigation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to