Devaraj Das created HBASE-14536:
-----------------------------------
Summary: Balancer & SSH interfering with each other leading to
unavailability
Key: HBASE-14536
URL: https://issues.apache.org/jira/browse/HBASE-14536
Project: HBase
Issue Type: Bug
Components: master, Region Assignment
Reporter: Devaraj Das
Came across this in our cluster:
1. The meta was assigned to a server 10.0.0.149,16020,1443507203340
{noformat}
2015-09-29 06:16:22,472 DEBUG [AM.ZK.Worker-pool2-t56]
master.RegionStates: Onlined 1588230740 on
10.0.0.149,16020,1443507203340 {ENCODED => 1588230740, NAME =>
'hbase:meta,,1', STARTKEY => '', ENDKEY => ''}
{noformat}
2. The server dies at some point:
{noformat}
2015-09-29 06:18:25,952 INFO [main-EventThread]
zookeeper.RegionServerTracker: RegionServer ephemeral node deleted,
processing expiration [10.0.0.149,16020,1443507203340]
2015-09-29 06:18:25,955 DEBUG [main-EventThread] master.AssignmentManager:
based on AM, current
region=hbase:meta,,1.1588230740 is on server=10.0.0.149,16020,1443507203340
server being checked:
10.0.0.149,16020,1443507203340
{noformat}
3. The balancer had computed a plan that contained a move for the meta:
{noformat}
2015-09-29 06:18:26,833 INFO
[B.defaultRpcServer.handler=12,queue=0,port=16000] master.HMaster:
balance hri=hbase:meta,,1.1588230740,
src=10.0.0.149,16020,1443507203340, dest=10.0.0.205,16020,1443507257905
{noformat}
4. The following ensues after this, leading to the meta remaining unassigned:
{noformat}
2015-09-29 06:18:26,859 DEBUG
[B.defaultRpcServer.handler=12,queue=0,port=16000]
master.AssignmentManager: Offline hbase:meta,,1.1588230740, no need to
unassign since it's on a dead server: 10.0.0.149,16020,1443507203340
......................
2015-09-29 06:18:26,899 INFO
[B.defaultRpcServer.handler=12,queue=0,port=16000] master.RegionStates:
Offlined 1588230740 from 10.0.0.149,16020,1443507203340
.....................
2015-09-29 06:18:26,914 INFO
[B.defaultRpcServer.handler=12,queue=0,port=16000]
master.AssignmentManager: Skip assigning hbase:meta,,1.1588230740, it is
on a dead but not processed yet server: 10.0.0.149,16020,1443507203340
....................
2015-09-29 06:18:26,915 DEBUG [AM.ZK.Worker-pool2-t58]
master.AssignmentManager: Znode hbase:meta,,1.1588230740 deleted,
state: {1588230740 state=OFFLINE, ts=1443507506914,
server=10.0.0.149,16020,1443507203340}
....................
2015-09-29 06:18:29,447 DEBUG
[MASTER_META_SERVER_OPERATIONS-10.0.0.148:16000-2] master.AssignmentManager:
based on AM, current
region=hbase:meta,,1.1588230740 is on server=null server being checked:
10.0.0.149,16020,1443507203340
2015-09-29 06:18:29,451 INFO [MASTER_META_SERVER_OPERATIONS-
10.0.0.148:16000-2] handler.MetaServerShutdownHandler: META has been
assigned to otherwhere, skip assigning.
2015-09-29 06:18:29,452 DEBUG
[MASTER_META_SERVER_OPERATIONS-10.0.0.148:16000-2]
master.DeadServer: Finished processing 10.0.0.149,16020,1443507203340
{noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)