[
https://issues.apache.org/jira/browse/HBASE-26256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17644632#comment-17644632
]
Haoze Wu commented on HBASE-26256:
----------------------------------
[~zhangduo] Thanks for your reply!
{quote}I think the only problem here is that, whether we should allow clients
to see the table when CreateTableProcedure is not finished yet.
{quote}
I agree with you on this point. This is exactly HBASE-27520 I propose recently.
My proposal is that we should not allow clients to see the table when
CreateTableProcedure is not finished.
Currently, at a high level, the table has 3 states: 1) disabled; 2) creating;
3) created.
Let's consider the behaviors users expect to see in 2 possible scenarios:
Scenario 1: if a table already exists and is disabled (by using the `disable`
command), then the `list` command should show this table. Note that at this
moment, this table is in the "disabled" state because it is disabled. But the
semantic of `list` is just showing all the tables we have, so we should show
this table.
Scenario 2: if a table does not exists and is being created (by a client), then
the `list` command should not show this table until the creation is finished.
Otherwise that would be an inconsistency – The users can see the table but
actually can not use the table.
In short, we should show the table in scenario 1, and should not show the table
in scenario 2. However, currently the system shows the table in both scenarios.
I think the core problem is that the "disabled" state has two possible
meanings: 1) exist but disabled; 2) not exist and being created.
I think the system should be able to distinguish these 2 states.
I propose a simple patch for this in HBASE-27520 (along with a PR). The basic
idea is maintaining a flag for each table to indicate whether it already exists.
Thanks!
> The potential delay of HDFS RPC in HRegion may cause data inconsistency and
> some HBase shell commands hanging
> -------------------------------------------------------------------------------------------------------------
>
> Key: HBASE-26256
> URL: https://issues.apache.org/jira/browse/HBASE-26256
> Project: HBase
> Issue Type: Bug
> Components: regionserver
> Affects Versions: 2.4.2
> Reporter: Haoze Wu
> Priority: Major
>
> When a RegionServer is initializing a new region, it writes its internal
> metadata (e.g., WAL) in the HDFS cluster. We find that this write operation
> can be potentially blocked due to network issues or overloading on HDFS side,
> and the delay will result in inconsistency to HBase clients and cause
> multiple HBase APIs to hang as well.
> *Reproduction*
> Steps to reproduce the symptom from scratch:
> # Start a HDFS cluster (1 NameNode + 2 DataNodes) with the default
> configuration.
> # Start a ZooKeeper cluster (3 nodes) with the default configuration.
> # Start a HBase cluster (1 Master + 2 RegionServers) with the default
> configuration.
> # In one of the RegionServers, introduce a delay by invoking `Thread.sleep`
> when it is creating its third region (alternatively, use a network packet
> loss injection tool like `tc`)
> # When the HBase cluster just gets started, the fault has not yet been
> triggered. We use the default HBase shell by running `bin/hbase shell` in the
> terminal. In the HBase shell, we repeatedly use the `create` command to
> create new tables, until the fault is triggered.
>
> When the fault occurs, we observe several symptoms as follows:
> # The HBase shell running the `create` command hangs, without any log or
> warning.
> # If we start another HBase shell and run the `list` command to see all the
> tables, we can see the table in the result. However, this table has actually
> not been created yet. Ideally the client should not see this pending table
> before `create` succeeds.
> # If we start another HBase shell and run the `disable` command to disable
> this table, the HBase shell will hang, without any log or warning. Ideally,
> we should see some error or warning within a short duration of time, because
> this table has not been created yet.
>
> The stack trace:
> {code:java}
> "RS_OPEN_REGION-regionserver/razor15:16022-0" #144 daemon prio=5 os_prio=0
> tid=0x00007f4c34ed8000 nid=0x4463 waiting on condition [0x00007f4bfd496000]
> java.lang.Thread.State: TIMED_WAITING (sleeping) at
> java.lang.Thread.sleep(Native Method) at
> org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:1075)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:955)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:8081)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegionFromTableDir(HRegion.java:8040)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:8016)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7974)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7925)
> at
> org.apache.hadoop.hbase.regionserver.handler.AssignRegionHandler.process(AssignRegionHandler.java:145)
> at
> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> {code}
>
> Relevant code snippet:
> {code:java}
> // file path:
> hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
> // class: org.apache.hadoop.hbase.regionserver.HRegion
> public class HRegion implements HeapSize, PropagatingConfigurationObserver,
> Region {
> // ...
> private long initializeRegionInternals(final CancelableProgressable
> reporter,
> final MonitoredTask status) throws IOException {
> // ...
> if (!isRestoredRegion) {
> // ...
> if (RegionReplicaUtil.isDefaultReplica(getRegionInfo())) {
> // ...
> // At and only at the third time of invocation,
> // invoke Thread.sleep, to simulate a delay of HDFS RPC
> WALSplitUtil.writeRegionSequenceIdFile(getWalFileSystem(),
> getWALRegionDir(),
> nextSeqId - 1);
> // ...
> }
> }
> // ...
> }
> // ...
> }
> {code}
> *Fix*
> We’re not quite sure about the root causes for the inconsistencies or the
> blocking of other APIs. One potential simple fix is to protect the
> `WALSplitUtil.writeRegionSequenceIdFile` operation (or the HDFS RPCs inside
> it) with timeout. We checked that throwing a timeout exception when the
> operation takes too long would resolve the aforementioned symptoms.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)