[ 
https://issues.apache.org/jira/browse/HBASE-26256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17645071#comment-17645071
 ] 

Haoze Wu commented on HBASE-26256:
----------------------------------

[~zhangduo] If you think my proposal makes sense, you can try to make some 
comments on [https://github.com/apache/hbase/pull/4916.] Thanks!

> The potential delay of HDFS RPC in HRegion may cause data inconsistency and 
> some HBase shell commands hanging
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-26256
>                 URL: https://issues.apache.org/jira/browse/HBASE-26256
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 2.4.2
>            Reporter: Haoze Wu
>            Priority: Major
>
> When a RegionServer is initializing a new region, it writes its internal 
> metadata (e.g., WAL) in the HDFS cluster. We find that this write operation 
> can be potentially blocked due to network issues or overloading on HDFS side, 
> and the delay will result in inconsistency to HBase clients and cause 
> multiple HBase APIs to hang as well.
> *Reproduction*
>    Steps to reproduce the symptom from scratch:
>  # Start a HDFS cluster (1 NameNode + 2 DataNodes) with the default 
> configuration.
>  # Start a ZooKeeper cluster (3 nodes) with the default configuration.
>  # Start a HBase cluster (1 Master + 2 RegionServers) with the default 
> configuration.
>  # In one of the RegionServers, introduce a delay by invoking `Thread.sleep` 
> when it is creating its third region (alternatively, use a network packet 
> loss injection tool like `tc`)
>  # When the HBase cluster just gets started, the fault has not yet been 
> triggered. We use the default HBase shell by running `bin/hbase shell` in the 
> terminal. In the HBase shell, we repeatedly use the `create` command to 
> create new tables, until the fault is triggered.
>  
> When the fault occurs, we observe several symptoms as follows:
>  # The HBase shell running the `create` command hangs, without any log or 
> warning.
>  # If we start another HBase shell and run the `list` command to see all the 
> tables, we can see the table in the result. However, this table has actually 
> not been created yet. Ideally the client should not see this pending table 
> before `create` succeeds. 
>  # If we start another HBase shell and run the `disable` command to disable 
> this table, the HBase shell will hang, without any log or warning. Ideally, 
> we should see some error or warning within a short duration of time, because 
> this table has not been created yet.
>  
>     The stack trace:
> {code:java}
> "RS_OPEN_REGION-regionserver/razor15:16022-0" #144 daemon prio=5 os_prio=0 
> tid=0x00007f4c34ed8000 nid=0x4463 waiting on condition [0x00007f4bfd496000]   
> java.lang.Thread.State: TIMED_WAITING (sleeping)    at 
> java.lang.Thread.sleep(Native Method)    at 
> org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:1075)
>     at 
> org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:955)    
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:8081)   
>  at 
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegionFromTableDir(HRegion.java:8040)
>     at 
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:8016)   
>  at 
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7974)   
>  at 
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7925)   
>  at 
> org.apache.hadoop.hbase.regionserver.handler.AssignRegionHandler.process(AssignRegionHandler.java:145)
>     at 
> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104)    
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>     at java.lang.Thread.run(Thread.java:748)
> {code}
>  
>    Relevant code snippet:
> {code:java}
> // file path: 
> hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
> // class: org.apache.hadoop.hbase.regionserver.HRegion
> public class HRegion implements HeapSize, PropagatingConfigurationObserver, 
> Region {
> // ...
>   private long initializeRegionInternals(final CancelableProgressable 
> reporter,
>       final MonitoredTask status) throws IOException {
>   // ...
>   if (!isRestoredRegion) {
>     // ...
>     if (RegionReplicaUtil.isDefaultReplica(getRegionInfo())) {
>       // ...
>       // At and only at the third time of invocation,
>       // invoke Thread.sleep, to simulate a delay of HDFS RPC 
>       WALSplitUtil.writeRegionSequenceIdFile(getWalFileSystem(), 
> getWALRegionDir(),
>         nextSeqId - 1);
>       // ...
>     }
>   }
>   // ...
>   }
> // ...
> }
> {code}
> *Fix*
> We’re not quite sure about the root causes for the inconsistencies or the 
> blocking of other APIs. One potential simple fix is to protect the  
> `WALSplitUtil.writeRegionSequenceIdFile` operation (or the HDFS RPCs inside 
> it) with timeout. We checked that throwing a timeout exception when the 
> operation takes too long would resolve the aforementioned symptoms.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to