Haoze Wu created HBASE-26256:
--------------------------------

             Summary: The potential delay of HDFS RPC in HRegion may cause data 
inconsistency and some HBase shell commands hanging
                 Key: HBASE-26256
                 URL: https://issues.apache.org/jira/browse/HBASE-26256
             Project: HBase
          Issue Type: Bug
          Components: regionserver
    Affects Versions: 2.4.2
            Reporter: Haoze Wu


When a RegionServer is initializing a new region, it writes its internal 
metadata (e.g., WAL) in the HDFS cluster. We find that this write operation can 
be potentially blocked due to network issues or overloading on HDFS side, and 
the delay will result in inconsistency to HBase clients and cause multiple 
HBase APIs to hang as well.

*Reproduction*

   Steps to reproduce the symptom from scratch:
 # Start a HDFS cluster (1 NameNode + 2 DataNodes) with the default 
configuration.
 # Start a ZooKeeper cluster (3 nodes) with the default configuration.
 # Start a HBase cluster (1 Master + 2 RegionServers) with the default 
configuration.
 # In one of the RegionServers, introduce a delay by invoking `Thread.sleep` 
when it is creating its third region (alternatively, use a network packet loss 
injection tool like `tc`)
 # When the HBase cluster just gets started, the fault has not yet been 
triggered. We use the default HBase shell by running `bin/hbase shell` in the 
terminal. In the HBase shell, we repeatedly use the `create` command to create 
new tables, until the fault is triggered.

 

When the fault occurs, we observe several symptoms as follows:
 # The HBase shell running the `create` command hangs, without any log or 
warning.
 # If we start another HBase shell and run the `list` command to see all the 
tables, we can see the table in the result. However, this table has actually 
not been created yet. Ideally the client should not see this pending table 
before `create` succeeds. 
 # If we start another HBase shell and run the `disable` command to disable 
this table, the HBase shell will hang, without any log or warning. Ideally, we 
should see some error or warning within a short duration of time, because this 
table has not been created yet.

 

    The stack trace:

 
{code:java}
"RS_OPEN_REGION-regionserver/razor15:16022-0" #144 daemon prio=5 os_prio=0 
tid=0x00007f4c34ed8000 nid=0x4463 waiting on condition [0x00007f4bfd496000]   
java.lang.Thread.State: TIMED_WAITING (sleeping)    at 
java.lang.Thread.sleep(Native Method)    at 
org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:1075)
    at 
org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:955)    at 
org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:8081)    
at 
org.apache.hadoop.hbase.regionserver.HRegion.openHRegionFromTableDir(HRegion.java:8040)
    at 
org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:8016)    
at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7974)  
  at 
org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7925)    
at 
org.apache.hadoop.hbase.regionserver.handler.AssignRegionHandler.process(AssignRegionHandler.java:145)
    at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104) 
   at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
   at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
   at java.lang.Thread.run(Thread.java:748)
{code}
 

   Relevant code snippet:
{code:java}
// file path: 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
// class: org.apache.hadoop.hbase.regionserver.HRegion

public class HRegion implements HeapSize, PropagatingConfigurationObserver, 
Region {
// ...
  private long initializeRegionInternals(final CancelableProgressable reporter,
      final MonitoredTask status) throws IOException {
  // ...
  if (!isRestoredRegion) {
    // ...
    if (RegionReplicaUtil.isDefaultReplica(getRegionInfo())) {
      // ...
      // At and only at the third time of invocation,
      // invoke Thread.sleep, to simulate a delay of HDFS RPC 
      WALSplitUtil.writeRegionSequenceIdFile(getWalFileSystem(), 
getWALRegionDir(),
        nextSeqId - 1);
      // ...
    }
  }
  // ...
  }
// ...
}
{code}
*Fix*

We’re not quite sure about the root causes for the inconsistencies or the 
blocking of other APIs. One potential simple fix is to protect the  
`WALSplitUtil.writeRegionSequenceIdFile` operation (or the HDFS RPCs inside it) 
with timeout. We checked that throwing a timeout exception when the operation 
takes too long would resolve the aforementioned symptoms.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to