[ 
https://issues.apache.org/jira/browse/HBASE-21266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16644143#comment-16644143
 ] 

Andrew Purtell commented on HBASE-21266:
----------------------------------------

This is going to need more work.

With this change in place a unit in TestAssignmentManagerOnCluster will fail.

Also, in ITBLL test scenarios with serverKilling policy, if the master is 
terminated while a region is splitting upon restart we can get this:

2018-10-09 20:59:46,242 WARN [ip-172-31-5-95:8100.activeMasterManager] 
master.AssignmentManager: Dropped splitting! Not in state good for SPLITTING; 
rs_p={332d04e88521c71ea4505592e434c9d1 state=SPLITTING, ts=1539118786241, 
server=ip-172-31-13-83.us-west-2.compute.internal,8120,1539118587733}, 
rs_a={1bbe77be39dfd903b31d00b98b02f842 state=OFFLINE, ts=1539118786229, 
server=null}, rs_b={6d6f67867f14d37c4fe35f3fe23f6cd8 state=OFFLINE, 
ts=1539118786230, server=null}

and the daughter regions will remain unassigned and unavailable, requiring hbck 
-fixAssignments. 

I think I see a mistake in the patch. Let me try again. 

> Not running balancer because processing dead regionservers, but empty dead rs 
> list
> ----------------------------------------------------------------------------------
>
>                 Key: HBASE-21266
>                 URL: https://issues.apache.org/jira/browse/HBASE-21266
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 1.4.8
>            Reporter: Andrew Purtell
>            Assignee: Andrew Purtell
>            Priority: Major
>             Fix For: 1.5.0, 1.4.9
>
>         Attachments: HBASE-21266-branch-1.patch
>
>
> Found during ITBLL testing. AM in master gets into a state where manual 
> attempts from the shell to run the balancer always return false and this is 
> printed in the master log:
> 2018-10-03 19:17:14,892 DEBUG 
> [RpcServer.default.FPBQ.Fifo.handler=21,queue=0,port=8100] master.HMaster: 
> Not running balancer because processing dead regionserver(s): 
> Note the empty list. 
> This errant state did not recover without intervention by way of master 
> restart, but the test environment was chaotic so needs investigation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to