[
https://issues.apache.org/jira/browse/HBASE-26256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17645071#comment-17645071
]
Haoze Wu commented on HBASE-26256:
----------------------------------
[~zhangduo] If you think my proposal makes sense, you can try to make some
comments on [https://github.com/apache/hbase/pull/4916.] Thanks!
> The potential delay of HDFS RPC in HRegion may cause data inconsistency and
> some HBase shell commands hanging
> -------------------------------------------------------------------------------------------------------------
>
> Key: HBASE-26256
> URL: https://issues.apache.org/jira/browse/HBASE-26256
> Project: HBase
> Issue Type: Bug
> Components: regionserver
> Affects Versions: 2.4.2
> Reporter: Haoze Wu
> Priority: Major
>
> When a RegionServer is initializing a new region, it writes its internal
> metadata (e.g., WAL) in the HDFS cluster. We find that this write operation
> can be potentially blocked due to network issues or overloading on HDFS side,
> and the delay will result in inconsistency to HBase clients and cause
> multiple HBase APIs to hang as well.
> *Reproduction*
> Steps to reproduce the symptom from scratch:
> # Start a HDFS cluster (1 NameNode + 2 DataNodes) with the default
> configuration.
> # Start a ZooKeeper cluster (3 nodes) with the default configuration.
> # Start a HBase cluster (1 Master + 2 RegionServers) with the default
> configuration.
> # In one of the RegionServers, introduce a delay by invoking `Thread.sleep`
> when it is creating its third region (alternatively, use a network packet
> loss injection tool like `tc`)
> # When the HBase cluster just gets started, the fault has not yet been
> triggered. We use the default HBase shell by running `bin/hbase shell` in the
> terminal. In the HBase shell, we repeatedly use the `create` command to
> create new tables, until the fault is triggered.
>
> When the fault occurs, we observe several symptoms as follows:
> # The HBase shell running the `create` command hangs, without any log or
> warning.
> # If we start another HBase shell and run the `list` command to see all the
> tables, we can see the table in the result. However, this table has actually
> not been created yet. Ideally the client should not see this pending table
> before `create` succeeds.
> # If we start another HBase shell and run the `disable` command to disable
> this table, the HBase shell will hang, without any log or warning. Ideally,
> we should see some error or warning within a short duration of time, because
> this table has not been created yet.
>
> The stack trace:
> {code:java}
> "RS_OPEN_REGION-regionserver/razor15:16022-0" #144 daemon prio=5 os_prio=0
> tid=0x00007f4c34ed8000 nid=0x4463 waiting on condition [0x00007f4bfd496000]
> java.lang.Thread.State: TIMED_WAITING (sleeping) at
> java.lang.Thread.sleep(Native Method) at
> org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:1075)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:955)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:8081)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegionFromTableDir(HRegion.java:8040)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:8016)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7974)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7925)
> at
> org.apache.hadoop.hbase.regionserver.handler.AssignRegionHandler.process(AssignRegionHandler.java:145)
> at
> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> {code}
>
> Relevant code snippet:
> {code:java}
> // file path:
> hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
> // class: org.apache.hadoop.hbase.regionserver.HRegion
> public class HRegion implements HeapSize, PropagatingConfigurationObserver,
> Region {
> // ...
> private long initializeRegionInternals(final CancelableProgressable
> reporter,
> final MonitoredTask status) throws IOException {
> // ...
> if (!isRestoredRegion) {
> // ...
> if (RegionReplicaUtil.isDefaultReplica(getRegionInfo())) {
> // ...
> // At and only at the third time of invocation,
> // invoke Thread.sleep, to simulate a delay of HDFS RPC
> WALSplitUtil.writeRegionSequenceIdFile(getWalFileSystem(),
> getWALRegionDir(),
> nextSeqId - 1);
> // ...
> }
> }
> // ...
> }
> // ...
> }
> {code}
> *Fix*
> We’re not quite sure about the root causes for the inconsistencies or the
> blocking of other APIs. One potential simple fix is to protect the
> `WALSplitUtil.writeRegionSequenceIdFile` operation (or the HDFS RPCs inside
> it) with timeout. We checked that throwing a timeout exception when the
> operation takes too long would resolve the aforementioned symptoms.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)