[ 
https://issues.apache.org/jira/browse/HBASE-11793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14104956#comment-14104956
 ] 

Andrew Purtell commented on HBASE-11793:
----------------------------------------

bq. is it ok if we not fix this but still add HBASE-11546 to 0.98.6 (as zk-less 
will be off by default)? The improvement in HBASE-11290 will try to make the 
locks more granular and supersede this one.

Sure, +1

> RegionStates shouldn't be locked while writing to META
> ------------------------------------------------------
>
>                 Key: HBASE-11793
>                 URL: https://issues.apache.org/jira/browse/HBASE-11793
>             Project: HBase
>          Issue Type: Bug
>          Components: Region Assignment
>            Reporter: Virag Kothari
>            Assignee: Virag Kothari
>
> Following scenario with zk-less assignment
> Two shutdown handler threads are running where one is 
> METAServerShutdownHandler.
> The ServershutdownHandler thread doing recovering of region server other than 
> META acquires lock on RegionStates while doing serverOffline() operation. It 
> keeps the lock while its trying to write to META (not assigned) 
> {quote}
> Thread 118 (MASTER_SERVER_OPERATIONS-gsbl90723:50510-2):
>   State: TIMED_WAITING
>   Blocked count: 430
>   Waited count: 36755
>   Stack:
>     java.lang.Object.wait(Native Method)
>     
> org.apache.hadoop.hbase.client.AsyncProcess.waitForNextTaskDone(AsyncProcess.java:853)
>     
> org.apache.hadoop.hbase.client.AsyncProcess.waitForMaximumCurrentTasks(AsyncProcess.java:879)
>     
> org.apache.hadoop.hbase.client.AsyncProcess.waitUntilDone(AsyncProcess.java:892)
>     
> org.apache.hadoop.hbase.client.HTable.backgroundFlushCommits(HTable.java:968)
>     org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:1252)
>     org.apache.hadoop.hbase.client.HTable.put(HTable.java:910)
>     
> org.apache.hadoop.hbase.master.RegionStateStore.updateRegionState(RegionStateStore.java:223)
>     
> org.apache.hadoop.hbase.master.RegionStates.updateRegionState(RegionStates.java:804)
>     
> org.apache.hadoop.hbase.master.RegionStates.updateRegionState(RegionStates.java:329)
>     
> org.apache.hadoop.hbase.master.RegionStates.updateRegionState(RegionStates.java:298)
>     
> org.apache.hadoop.hbase.master.RegionStates.regionOffline(RegionStates.java:449)
>     
> org.apache.hadoop.hbase.master.RegionStates.regionOffline(RegionStates.java:429)
>     
> org.apache.hadoop.hbase.master.RegionStates.serverOffline(RegionStates.java:498)
>     
> org.apache.hadoop.hbase.master.AssignmentManager.processServerShutdown(AssignmentManager.java:3404)
>     
> org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:214)
> {quote}
> In meanwhile, MetaServerShutdownHandler thread cant assign META as it is 
> blocked on RegionStates lock 
> {quote}
> Thread 126 (MASTER_META_SERVER_OPERATIONS-gsbl90723:50510-0):
>         State: BLOCKED
>         Blocked count: 52
>         Waited count: 100
>         Blocked on org.apache.hadoop.hbase.master.RegionStates@7398b4c1
>         Blocked by 118 (MASTER_SERVER_OPERATIONS-gsbl90723:50510-2)
>         Stack:
>           
> org.apache.hadoop.hbase.master.RegionStates.clearLastAssignment(RegionStates.java:422)
>           
> org.apache.hadoop.hbase.master.RegionStates.logSplit(RegionStates.java:418)
>  
> org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler.process(MetaServerShutdownHandler.java:79)
> {quote}
> As the first thread wont be able to write to META, it keeps on retrying (the 
> retry time is huge: hbase.client.retries.number*10) till it fails.
> During that time MetaServerShutdownHandler is blocked. 
> Also the first thread calls abort on Master as it had failed, but to 
> aggravate the problem, Master wont abort as it also wants to lock the 
> RegionStates :)
> {quote}
>         Blocked on org.apache.hadoop.hbase.master.RegionStates@7398b4c1
>                 Blocked by 118 (MASTER_SERVER_OPERATIONS-gsbl90723:50510-2)
>                 Stack:
>                   
> org.apache.hadoop.hbase.master.RegionStates.getRegionsInTransition(RegionStates.java:152)
>                   
> org.apache.hadoop.hbase.master.AssignmentManager.updateRegionsInTransitionMetrics(AssignmentManager.java:3081)
>                   
> org.apache.hadoop.hbase.master.HMaster.doMetrics(HMaster.java:751)
>                   
> org.apache.hadoop.hbase.master.HMaster.loop(HMaster.java:738)
>                   org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:607)
> {quote}
> Seems region states shouldn't be locked when IO is happening.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to