Virag Kothari created HBASE-11793:
-------------------------------------

             Summary: RegionStates shouldn't be locked while writing to META
                 Key: HBASE-11793
                 URL: https://issues.apache.org/jira/browse/HBASE-11793
             Project: HBase
          Issue Type: Bug
          Components: Region Assignment
            Reporter: Virag Kothari
            Assignee: Virag Kothari


Following scenario with zk-less assignment
Two shutdown handler threads are running where one is METAServerShutdownHandler.
The ServershutdownHandler thread doing recovering of region server other than 
META acquires lock on RegionStates while doing serverOffline() operation. It 
keeps the lock while its trying to write to META (not assigned) 
{quote}
Thread 118 (MASTER_SERVER_OPERATIONS-gsbl90723:50510-2):
  State: TIMED_WAITING
  Blocked count: 430
  Waited count: 36755
  Stack:
    java.lang.Object.wait(Native Method)
    
org.apache.hadoop.hbase.client.AsyncProcess.waitForNextTaskDone(AsyncProcess.java:853)
    
org.apache.hadoop.hbase.client.AsyncProcess.waitForMaximumCurrentTasks(AsyncProcess.java:879)
    
org.apache.hadoop.hbase.client.AsyncProcess.waitUntilDone(AsyncProcess.java:892)
    
org.apache.hadoop.hbase.client.HTable.backgroundFlushCommits(HTable.java:968)
    org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:1252)
    org.apache.hadoop.hbase.client.HTable.put(HTable.java:910)
    
org.apache.hadoop.hbase.master.RegionStateStore.updateRegionState(RegionStateStore.java:223)
    
org.apache.hadoop.hbase.master.RegionStates.updateRegionState(RegionStates.java:804)
    
org.apache.hadoop.hbase.master.RegionStates.updateRegionState(RegionStates.java:329)
    
org.apache.hadoop.hbase.master.RegionStates.updateRegionState(RegionStates.java:298)
    
org.apache.hadoop.hbase.master.RegionStates.regionOffline(RegionStates.java:449)
    
org.apache.hadoop.hbase.master.RegionStates.regionOffline(RegionStates.java:429)
    
org.apache.hadoop.hbase.master.RegionStates.serverOffline(RegionStates.java:498)
    
org.apache.hadoop.hbase.master.AssignmentManager.processServerShutdown(AssignmentManager.java:3404)
    
org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:214)
{quote}
In meanwhile, MetaServerShutdownHandler thread cant assign META as it is 
blocked on RegionStates lock 
{quote}
Thread 126 (MASTER_META_SERVER_OPERATIONS-gsbl90723:50510-0):
          State: BLOCKED
          Blocked count: 52
          Waited count: 100
          Blocked on org.apache.hadoop.hbase.master.RegionStates@7398b4c1
          Blocked by 118 (MASTER_SERVER_OPERATIONS-gsbl90723:50510-2)
          Stack:
            
org.apache.hadoop.hbase.master.RegionStates.clearLastAssignment(RegionStates.java:422)
            
org.apache.hadoop.hbase.master.RegionStates.logSplit(RegionStates.java:418)
 
org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler.process(MetaServerShutdownHandler.java:79)
{quote}

As the first thread wont be able to write to META, it keeps on retrying (the 
retry time is huge: hbase.client.retries.number*10) till it fails.
During that time MetaServerShutdownHandler is blocked. 
Also the first thread calls abort on Master as it had failed, but to aggravate 
the problem, Master wont abort as it also wants to lock the RegionStates :)

{quote}
          Blocked on org.apache.hadoop.hbase.master.RegionStates@7398b4c1
                  Blocked by 118 (MASTER_SERVER_OPERATIONS-gsbl90723:50510-2)
                  Stack:
                    
org.apache.hadoop.hbase.master.RegionStates.getRegionsInTransition(RegionStates.java:152)
                    
org.apache.hadoop.hbase.master.AssignmentManager.updateRegionsInTransitionMetrics(AssignmentManager.java:3081)
                    
org.apache.hadoop.hbase.master.HMaster.doMetrics(HMaster.java:751)
                    
org.apache.hadoop.hbase.master.HMaster.loop(HMaster.java:738)
                    org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:607)
{quote}

Seems region states shouldn't be locked when IO is happening.




--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to