Rajeshbabu Chintaguntla created HBASE-12901:
-----------------------------------------------
Summary: Possible deadlock while onlining a region and get region
plan for other region run parallel
Key: HBASE-12901
URL: https://issues.apache.org/jira/browse/HBASE-12901
Project: HBase
Issue Type: Bug
Reporter: Rajeshbabu Chintaguntla
Assignee: Rajeshbabu Chintaguntla
Priority: Critical
Fix For: 1.0.0, 1.1.0
There is a deadlock when region state updating(regionOnline)after assignment
completed and getting region plan to other region parallelly. Before onlining
we are synchronizing on regionStates and inside synchronizing on regionPlans to
clear the region plan. At the same time there is a chance that while getting
plan first we synchornize on regionPlans and then regionStates while getting
assignments of a server. This is coming after HBASE-12686 fix. This issue
present in branch-1 and branch-1.1 only.
{code}
"AM.-pool1-t33":
at
org.apache.hadoop.hbase.master.AssignmentManager.clearRegionPlan(AssignmentManager.java:2917)
- waiting to lock <0x00000000d0147f70> (a java.util.TreeMap)
at
org.apache.hadoop.hbase.master.AssignmentManager.regionOffline(AssignmentManager.java:3617)
at
org.apache.hadoop.hbase.master.AssignmentManager.regionOffline(AssignmentManager.java:1402)
at
org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1734)
at
org.apache.hadoop.hbase.master.AssignmentManager.forceRegionStateToOffline(AssignmentManager.java:1821)
at
org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1456)
at
org.apache.hadoop.hbase.master.AssignCallable.call(AssignCallable.java:45)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
"AM.-pool1-t29":
at
org.apache.hadoop.hbase.master.RegionStates.getRegionAssignments(RegionStates.java:155)
- waiting to lock <0x00000000d010b250> (a
org.apache.hadoop.hbase.master.RegionStates)
at
org.apache.hadoop.hbase.master.AssignmentManager.getSnapShotOfAssignment(AssignmentManager.java:3629)
at
org.apache.hadoop.hbase.master.balancer.BaseLoadBalancer.getRegionAssignmentsByServer(BaseLoadBalancer.java:1146)
at
org.apache.hadoop.hbase.master.balancer.BaseLoadBalancer.createCluster(BaseLoadBalancer.java:959)
at
org.apache.hadoop.hbase.master.balancer.BaseLoadBalancer.randomAssignment(BaseLoadBalancer.java:1010)
at
org.apache.hadoop.hbase.master.AssignmentManager.getRegionPlan(AssignmentManager.java:2228)
- locked <0x00000000d0147f70> (a java.util.TreeMap)
at
org.apache.hadoop.hbase.master.AssignmentManager.getRegionPlan(AssignmentManager.java:2185)
at
org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1905)
at
org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1464)
at
org.apache.hadoop.hbase.master.AssignCallable.call(AssignCallable.java:45)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
"AM.ZK.Worker-pool2-t41":
at
org.apache.hadoop.hbase.master.AssignmentManager.clearRegionPlan(AssignmentManager.java:2917)
- waiting to lock <0x00000000d0147f70> (a java.util.TreeMap)
at
org.apache.hadoop.hbase.master.AssignmentManager.regionOnline(AssignmentManager.java:1305)
at
org.apache.hadoop.hbase.master.AssignmentManager$4.run(AssignmentManager.java:1196)
- locked <0x00000000d010b250> (a
org.apache.hadoop.hbase.master.RegionStates)
at
org.apache.hadoop.hbase.master.AssignmentManager$3.run(AssignmentManager.java:1142)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
{code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)