[ 
https://issues.apache.org/jira/browse/HBASE-12901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14288899#comment-14288899
 ] 

Enis Soztutar commented on HBASE-12901:
---------------------------------------

Thanks Ted and Rajesh. This version (v2) seems better.+1 

> Possible deadlock while onlining a region and get region plan for other 
> region run parallel
> -------------------------------------------------------------------------------------------
>
>                 Key: HBASE-12901
>                 URL: https://issues.apache.org/jira/browse/HBASE-12901
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Rajeshbabu Chintaguntla
>            Assignee: Rajeshbabu Chintaguntla
>            Priority: Critical
>             Fix For: 1.0.0, 1.1.0
>
>         Attachments: 12901-suggest.txt, HBASE-12901.patch, 
> HBASE-12901_v2.patch
>
>
> There is a deadlock when region state updating(regionOnline)after assignment 
> completed and getting region plan to other region parallelly. Before onlining 
> we are synchronizing on regionStates and inside synchronizing on regionPlans 
> to clear the region plan. At the same time there is a chance that while 
> getting plan first we synchornize on regionPlans and then regionStates while 
> getting assignments of a server. This is coming after HBASE-12686 fix. This 
> issue present in branch-1 and branch-1.1 only. 
> {code}
> "AM.-pool1-t33":
>       at 
> org.apache.hadoop.hbase.master.AssignmentManager.clearRegionPlan(AssignmentManager.java:2917)
>       - waiting to lock <0x00000000d0147f70> (a java.util.TreeMap)
>       at 
> org.apache.hadoop.hbase.master.AssignmentManager.regionOffline(AssignmentManager.java:3617)
>       at 
> org.apache.hadoop.hbase.master.AssignmentManager.regionOffline(AssignmentManager.java:1402)
>       at 
> org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1734)
>       at 
> org.apache.hadoop.hbase.master.AssignmentManager.forceRegionStateToOffline(AssignmentManager.java:1821)
>       at 
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1456)
>       at 
> org.apache.hadoop.hbase.master.AssignCallable.call(AssignCallable.java:45)
>       at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>       at java.lang.Thread.run(Thread.java:745)
> "AM.-pool1-t29":
>       at 
> org.apache.hadoop.hbase.master.RegionStates.getRegionAssignments(RegionStates.java:155)
>       - waiting to lock <0x00000000d010b250> (a 
> org.apache.hadoop.hbase.master.RegionStates)
>       at 
> org.apache.hadoop.hbase.master.AssignmentManager.getSnapShotOfAssignment(AssignmentManager.java:3629)
>       at 
> org.apache.hadoop.hbase.master.balancer.BaseLoadBalancer.getRegionAssignmentsByServer(BaseLoadBalancer.java:1146)
>       at 
> org.apache.hadoop.hbase.master.balancer.BaseLoadBalancer.createCluster(BaseLoadBalancer.java:959)
>       at 
> org.apache.hadoop.hbase.master.balancer.BaseLoadBalancer.randomAssignment(BaseLoadBalancer.java:1010)
>       at 
> org.apache.hadoop.hbase.master.AssignmentManager.getRegionPlan(AssignmentManager.java:2228)
>       - locked <0x00000000d0147f70> (a java.util.TreeMap)
>       at 
> org.apache.hadoop.hbase.master.AssignmentManager.getRegionPlan(AssignmentManager.java:2185)
>       at 
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1905)
>       at 
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1464)
>       at 
> org.apache.hadoop.hbase.master.AssignCallable.call(AssignCallable.java:45)
>       at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>       at java.lang.Thread.run(Thread.java:745)
> "AM.ZK.Worker-pool2-t41":
>       at 
> org.apache.hadoop.hbase.master.AssignmentManager.clearRegionPlan(AssignmentManager.java:2917)
>       - waiting to lock <0x00000000d0147f70> (a java.util.TreeMap)
>       at 
> org.apache.hadoop.hbase.master.AssignmentManager.regionOnline(AssignmentManager.java:1305)
>       at 
> org.apache.hadoop.hbase.master.AssignmentManager$4.run(AssignmentManager.java:1196)
>       - locked <0x00000000d010b250> (a 
> org.apache.hadoop.hbase.master.RegionStates)
>       at 
> org.apache.hadoop.hbase.master.AssignmentManager$3.run(AssignmentManager.java:1142)
>       at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>       at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>       at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to