[
https://issues.apache.org/jira/browse/HBASE-12901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ted Yu updated HBASE-12901:
---------------------------
Attachment: 12901-suggest.txt
Nice finding, Rajesh.
The current patch calls balancer.randomAssignment() even if no new plan is
needed. Thus obtaining (waiting on) lock on RegionStates potentially
unnecessarily.
Here is an alternative patch where lock on RegionStates is only obtained when
new plan is needed. For your reference.
> Possible deadlock while onlining a region and get region plan for other
> region run parallel
> -------------------------------------------------------------------------------------------
>
> Key: HBASE-12901
> URL: https://issues.apache.org/jira/browse/HBASE-12901
> Project: HBase
> Issue Type: Bug
> Reporter: Rajeshbabu Chintaguntla
> Assignee: Rajeshbabu Chintaguntla
> Priority: Critical
> Fix For: 1.0.0, 1.1.0
>
> Attachments: 12901-suggest.txt, HBASE-12901.patch
>
>
> There is a deadlock when region state updating(regionOnline)after assignment
> completed and getting region plan to other region parallelly. Before onlining
> we are synchronizing on regionStates and inside synchronizing on regionPlans
> to clear the region plan. At the same time there is a chance that while
> getting plan first we synchornize on regionPlans and then regionStates while
> getting assignments of a server. This is coming after HBASE-12686 fix. This
> issue present in branch-1 and branch-1.1 only.
> {code}
> "AM.-pool1-t33":
> at
> org.apache.hadoop.hbase.master.AssignmentManager.clearRegionPlan(AssignmentManager.java:2917)
> - waiting to lock <0x00000000d0147f70> (a java.util.TreeMap)
> at
> org.apache.hadoop.hbase.master.AssignmentManager.regionOffline(AssignmentManager.java:3617)
> at
> org.apache.hadoop.hbase.master.AssignmentManager.regionOffline(AssignmentManager.java:1402)
> at
> org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1734)
> at
> org.apache.hadoop.hbase.master.AssignmentManager.forceRegionStateToOffline(AssignmentManager.java:1821)
> at
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1456)
> at
> org.apache.hadoop.hbase.master.AssignCallable.call(AssignCallable.java:45)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> "AM.-pool1-t29":
> at
> org.apache.hadoop.hbase.master.RegionStates.getRegionAssignments(RegionStates.java:155)
> - waiting to lock <0x00000000d010b250> (a
> org.apache.hadoop.hbase.master.RegionStates)
> at
> org.apache.hadoop.hbase.master.AssignmentManager.getSnapShotOfAssignment(AssignmentManager.java:3629)
> at
> org.apache.hadoop.hbase.master.balancer.BaseLoadBalancer.getRegionAssignmentsByServer(BaseLoadBalancer.java:1146)
> at
> org.apache.hadoop.hbase.master.balancer.BaseLoadBalancer.createCluster(BaseLoadBalancer.java:959)
> at
> org.apache.hadoop.hbase.master.balancer.BaseLoadBalancer.randomAssignment(BaseLoadBalancer.java:1010)
> at
> org.apache.hadoop.hbase.master.AssignmentManager.getRegionPlan(AssignmentManager.java:2228)
> - locked <0x00000000d0147f70> (a java.util.TreeMap)
> at
> org.apache.hadoop.hbase.master.AssignmentManager.getRegionPlan(AssignmentManager.java:2185)
> at
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1905)
> at
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1464)
> at
> org.apache.hadoop.hbase.master.AssignCallable.call(AssignCallable.java:45)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> "AM.ZK.Worker-pool2-t41":
> at
> org.apache.hadoop.hbase.master.AssignmentManager.clearRegionPlan(AssignmentManager.java:2917)
> - waiting to lock <0x00000000d0147f70> (a java.util.TreeMap)
> at
> org.apache.hadoop.hbase.master.AssignmentManager.regionOnline(AssignmentManager.java:1305)
> at
> org.apache.hadoop.hbase.master.AssignmentManager$4.run(AssignmentManager.java:1196)
> - locked <0x00000000d010b250> (a
> org.apache.hadoop.hbase.master.RegionStates)
> at
> org.apache.hadoop.hbase.master.AssignmentManager$3.run(AssignmentManager.java:1142)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)