Nick Dimiduk created HBASE-10632:
------------------------------------
Summary: Region lost in limbo after ArrayIndexOutOfBoundsException
during assignment
Key: HBASE-10632
URL: https://issues.apache.org/jira/browse/HBASE-10632
Project: HBase
Issue Type: Bug
Components: Region Assignment
Affects Versions: hbase-10070
Reporter: Nick Dimiduk
Assignee: Enis Soztutar
Fix For: 0.99.0, hbase-10070
Discovered while running IntegrationTestBigLinkedList. Region
24d68aa7239824e42390a77b7212fcbf is scheduled for move from hor13n19 to
hor13n13. During the process an exception is thrown.
{noformat}
2014-02-25 15:30:42,613 INFO [MASTER_SERVER_OPERATIONS-hor13n12:60000-4]
master.RegionStates: Transitioning {24d68aa7239824e42390a77b7212fcbf
state=OPENING, ts=1393342207107,
server=hor13n19.gq1.ygridcore.net,60020,1393341563552} will be handled by SSH
for hor13n19.gq1.ygridcore.net,60020,1393341563552
2014-02-25 15:30:42,613 INFO [MASTER_SERVER_OPERATIONS-hor13n12:60000-4]
handler.ServerShutdownHandler: Reassigning 7 region(s) that
hor13n19.gq1.ygridcore.net,60020,1393341563552 was carrying (and 0 regions(s)
that were opening on this server)
2014-02-25 15:30:42,613 INFO [MASTER_SERVER_OPERATIONS-hor13n12:60000-4]
handler.ServerShutdownHandler: Reassigning region with rs =
{24d68aa7239824e42390a77b7212fcbf state=OPENING, ts=1393342207107,
server=hor13n19.gq1.ygridcore.net,60020,1393341563552} and deleting zk node if
exists
2014-02-25 15:30:42,623 INFO [MASTER_SERVER_OPERATIONS-hor13n12:60000-4]
master.RegionStates: Transitioned {24d68aa7239824e42390a77b7212fcbf
state=OPENING, ts=1393342207107,
server=hor13n19.gq1.ygridcore.net,60020,1393341563552} to
{24d68aa7239824e42390a77b7212fcbf state=OFFLINE, ts=1393342242623,
server=hor13n19.gq1.ygridcore.net,60020,1393341563552}
2014-02-25 15:30:42,623 DEBUG [AM.ZK.Worker-pool2-t46]
master.AssignmentManager: Znode
IntegrationTestBigLinkedList,\x80\x06\x1A,1393342105093.24d68aa7239824e42390a77b7212fcbf.
deleted, state: {24d68aa7239824e42390a77b7212fcbf state=OFFLINE,
ts=1393342242623, server=hor13n19.gq1.ygridcore.net,60020,1393341563552}
...
2014-02-25 15:30:43,993 ERROR [MASTER_SERVER_OPERATIONS-hor13n12:60000-4]
executor.EventHandler: Caught throwable while processing event M_SERVER_SHUTDOWN
java.lang.ArrayIndexOutOfBoundsException: 0
at
org.apache.hadoop.hbase.master.balancer.BaseLoadBalancer$Cluster.<init>(BaseLoadBalancer.java:250)
at
org.apache.hadoop.hbase.master.balancer.BaseLoadBalancer.createCluster(BaseLoadBalancer.java:921)
at
org.apache.hadoop.hbase.master.balancer.BaseLoadBalancer.roundRobinAssignment(BaseLoadBalancer.java:860)
at
org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:2482)
at
org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:282)
at
org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)
{noformat}
After that, region is left in limbo and is never reassigned.
{noformat}
2014-02-25 15:35:11,581 INFO [FifoRpcScheduler.handler1-thread-6]
master.HMaster: Client=hrt_qa//68.142.246.29 move
hri=IntegrationTestBigLinkedList,\x80\x06\x1A,1393342105093.24d68aa7239824e42390a77b7212fcbf.,
src=hor13n19.gq1.ygridcore.net,60020,1393341563552,
dest=hor13n13.gq1.ygridcore.net,60020,1393342222275, running balancer
2014-02-25 15:35:11,581 INFO [FifoRpcScheduler.handler1-thread-6]
master.AssignmentManager: Ignored moving region not assigned: {ENCODED =>
24d68aa7239824e42390a77b7212fcbf, NAME =>
'IntegrationTestBigLinkedList,\x80\x06\x1A,1393342105093.24d68aa7239824e42390a77b7212fcbf.',
STARTKEY => '\x80\x06\x1A', ENDKEY => ''}, {24d68aa7239824e42390a77b7212fcbf
state=OFFLINE, ts=1393342242623,
server=hor13n19.gq1.ygridcore.net,60020,1393341563552}
...
2014-02-25 15:35:26,586 DEBUG
[hor13n12.gq1.ygridcore.net,60000,1393341917402-BalancerChore] master.HMaster:
Not running balancer because 1 region(s) in transition:
{24d68aa7239824e42390a77b7212fcbf={24d68aa7239824e42390a77b7212fcbf
state=OFFLINE, ts=1393342242623,
server=hor13n19.gq1.ygridcore.net,60020,1393341563552}}
...
2014-02-25 15:35:51,945 DEBUG [FifoRpcScheduler.handler1-thread-16]
master.HMaster: Client=hrt_qa//68.142.246.29 unassign
IntegrationTestBigLinkedList,\x80\x06\x1A,1393342105093.24d68aa7239824e42390a77b7212fcbf.
in current location if it is online and reassign.force=false
2014-02-25 15:35:51,945 DEBUG [FifoRpcScheduler.handler1-thread-16]
master.AssignmentManager: Starting unassign of
IntegrationTestBigLinkedList,\x80\x06\x1A,1393342105093.24d68aa7239824e42390a77b7212fcbf.
(offlining), current state: {24d68aa7239824e42390a77b7212fcbf state=OFFLINE,
ts=1393342242623, server=hor13n19.gq1.ygridcore.net,60020,1393341563552}
2014-02-25 15:35:51,945 DEBUG [FifoRpcScheduler.handler1-thread-16]
master.AssignmentManager: Attempting to unassign
IntegrationTestBigLinkedList,\x80\x06\x1A,1393342105093.24d68aa7239824e42390a77b7212fcbf.
but it is already in transition (OFFLINE, force=false)
...
2014-02-25 15:40:26,587 DEBUG
[hor13n12.gq1.ygridcore.net,60000,1393341917402-BalancerChore] master.HMaster:
Not running balancer because 1 region(s) in transition:
{24d68aa7239824e42390a77b7212fcbf={24d68aa7239824e42390a77b7212fcbf
state=OFFLINE, ts=1393342242623,
server=hor13n19.gq1.ygridcore.net,60020,1393341563552}}
{noformat}
Spoke with [~enis] about it earlier, assigning to him.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)