longping_jie created HBASE-27850:
------------------------------------
Summary: TimeoutIOException: Failed to get sync result after
300000 ms for txid=16920651960, WAL system stuck?
Key: HBASE-27850
URL: https://issues.apache.org/jira/browse/HBASE-27850
Project: HBase
Issue Type: Bug
Components: regionserver
Affects Versions: 2.2.6
Environment: hbase 2.2.6
hadoop 3.3.1
Reporter: longping_jie
Attachments: 49151.log1
A node under a RsGroup (only one table), at a certain moment, the write call
queue is blocked, and the blocking time starts, and the reading and writing qps
of this table are all reduced to 0, and the client cannot read and write the
table, RS call At the point in time when queue blocking starts, the following
errors are continuously reported in the log:
2023-05-08 12:42:27,310 ERROR [MemStoreFlusher.2] regionserver.MemStoreFlusher:
Cache flush failed for region
user_feature_v2,eacf_1658057555,1660314723816.2376cc2326b5372131cc530b115d959a.
org.apache.hadoop.hbase.exceptions.TimeoutIOException: Failed to get sync
result after 300000 ms for txid=16920651960, WAL system stuck?
at
org.apache.hadoop.hbase.regionserver.wal.SyncFuture.get(SyncFuture.java:155)
at
org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.blockOnSync(AbstractFSWAL.java:743)
at
org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.sync(AsyncFSWAL.java:625)
at
org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.sync(AsyncFSWAL.java:602)
at
org.apache.hadoop.hbase.regionserver.HRegion.doSyncOfUnflushedWALChanges(HRegion.java:2754)
at
org.apache.hadoop.hbase.regionserver.HRegion.internalPrepareFlushCache(HRegion.java:2691)
at
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2549)
at
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2523)
at
org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2409)
at
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:611)
at
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:580)
at
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$1000(MemStoreFlusher.java:68)
at
org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:360)
at java.lang.Thread.run(Thread.java:748)
The data in the node memstore cannot be flushed to the WAL file, other
indicators of the node are normal, and HDFS is not under pressure. After
restarting the blocked node, the table returned to normal.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)