Rajeshbabu Chintaguntla created HBASE-12901:
-----------------------------------------------

             Summary: Possible deadlock while onlining a region and get region 
plan for other region run parallel
                 Key: HBASE-12901
                 URL: https://issues.apache.org/jira/browse/HBASE-12901
             Project: HBase
          Issue Type: Bug
            Reporter: Rajeshbabu Chintaguntla
            Assignee: Rajeshbabu Chintaguntla
            Priority: Critical
             Fix For: 1.0.0, 1.1.0


There is a deadlock when region state updating(regionOnline)after assignment 
completed and getting region plan to other region parallelly. Before onlining 
we are synchronizing on regionStates and inside synchronizing on regionPlans to 
clear the region plan. At the same time there is a chance that while getting 
plan first we synchornize on regionPlans and then regionStates while getting 
assignments of a server. This is coming after HBASE-12686 fix. This issue 
present in branch-1 and branch-1.1 only. 
{code}
"AM.-pool1-t33":
        at 
org.apache.hadoop.hbase.master.AssignmentManager.clearRegionPlan(AssignmentManager.java:2917)
        - waiting to lock <0x00000000d0147f70> (a java.util.TreeMap)
        at 
org.apache.hadoop.hbase.master.AssignmentManager.regionOffline(AssignmentManager.java:3617)
        at 
org.apache.hadoop.hbase.master.AssignmentManager.regionOffline(AssignmentManager.java:1402)
        at 
org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1734)
        at 
org.apache.hadoop.hbase.master.AssignmentManager.forceRegionStateToOffline(AssignmentManager.java:1821)
        at 
org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1456)
        at 
org.apache.hadoop.hbase.master.AssignCallable.call(AssignCallable.java:45)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
"AM.-pool1-t29":
        at 
org.apache.hadoop.hbase.master.RegionStates.getRegionAssignments(RegionStates.java:155)
        - waiting to lock <0x00000000d010b250> (a 
org.apache.hadoop.hbase.master.RegionStates)
        at 
org.apache.hadoop.hbase.master.AssignmentManager.getSnapShotOfAssignment(AssignmentManager.java:3629)
        at 
org.apache.hadoop.hbase.master.balancer.BaseLoadBalancer.getRegionAssignmentsByServer(BaseLoadBalancer.java:1146)
        at 
org.apache.hadoop.hbase.master.balancer.BaseLoadBalancer.createCluster(BaseLoadBalancer.java:959)
        at 
org.apache.hadoop.hbase.master.balancer.BaseLoadBalancer.randomAssignment(BaseLoadBalancer.java:1010)
        at 
org.apache.hadoop.hbase.master.AssignmentManager.getRegionPlan(AssignmentManager.java:2228)
        - locked <0x00000000d0147f70> (a java.util.TreeMap)
        at 
org.apache.hadoop.hbase.master.AssignmentManager.getRegionPlan(AssignmentManager.java:2185)
        at 
org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1905)
        at 
org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1464)
        at 
org.apache.hadoop.hbase.master.AssignCallable.call(AssignCallable.java:45)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
"AM.ZK.Worker-pool2-t41":
        at 
org.apache.hadoop.hbase.master.AssignmentManager.clearRegionPlan(AssignmentManager.java:2917)
        - waiting to lock <0x00000000d0147f70> (a java.util.TreeMap)
        at 
org.apache.hadoop.hbase.master.AssignmentManager.regionOnline(AssignmentManager.java:1305)
        at 
org.apache.hadoop.hbase.master.AssignmentManager$4.run(AssignmentManager.java:1196)
        - locked <0x00000000d010b250> (a 
org.apache.hadoop.hbase.master.RegionStates)
        at 
org.apache.hadoop.hbase.master.AssignmentManager$3.run(AssignmentManager.java:1142)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to