March Wang created HBASE-27413:
----------------------------------
Summary: MemStoreFlusher: Cache flush failed for region
Key: HBASE-27413
URL: https://issues.apache.org/jira/browse/HBASE-27413
Project: HBase
Issue Type: Bug
Affects Versions: 2.2.6
Environment: Hadoop 3.3.1
HBase 2.2.6
jdk: openjdk version "1.8.0_342"
Reporter: March Wang
We are running an HBase cluster (Hadoop 3.3.1 + HBase 2.2.6) when clients
frequently write a lot of data into this HBase cluster, we got the message
"Cache flush failed for the region" error log. then got a lot of
"responseTooSlow" logs. below are the sample logs of the "Cache flush failed
for region" messages:
{code:java}
022-08-09 11:04:05,069 DEBUG [MemStoreFlusher.0] regionserver.HRegion: Flush
status journal:
Acquiring readlock on region at 1660067945028
Running coprocessor pre-flush hooks at 1660067945028
Obtaining lock to block concurrent updates at 1660067945028
Preparing flush snapshotting stores in dfab869db13603b7800cf7b33d58a885 at
1660067945028
Finished memstore snapshotting
ns1:htable1,77e1422d7b070b9ee3ac9466174655e884d41d87bbe9182e05661cf11d02298b,1653473214632.dfab869db13603b7800cf7b33d58a885.,
syncing WAL and waiting on mvcc, flushsize=dataSize=28417, getHeapSize=59360,
getOffHeapSize=0, getCellsCount=284 at 1660067945028
2022-08-09 11:04:05,070 ERROR [MemStoreFlusher.0] regionserver.MemStoreFlusher:
Cache flush failed for region
ns1:htable1,77e1422d7b070b9ee3ac9466174655e884d41d87bbe9182e05661cf11d02298b,1653473214632.dfab869db13603b7800cf7b33d58a885.
org.apache.hadoop.hbase.exceptions.TimeoutIOException: Failed to get sync
result after 300000 ms for txid=77949378, WAL system stuck?
at
org.apache.hadoop.hbase.regionserver.wal.SyncFuture.get(SyncFuture.java:145)
at
org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.blockOnSync(AbstractFSWAL.java:722)
at
org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.sync(AsyncFSWAL.java:616)
at
org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.sync(AsyncFSWAL.java:593)
at
org.apache.hadoop.hbase.regionserver.HRegion.doSyncOfUnflushedWALChanges(HRegion.java:2754)
at
org.apache.hadoop.hbase.regionserver.HRegion.internalPrepareFlushCache(HRegion.java:2691)
at
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2549)
at
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2523)
at
org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2413)
at
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:611)
at
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:580)
at
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$1000(MemStoreFlusher.java:68)
at
org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:360)
at java.lang.Thread.run(Thread.java:748)
2022-08-09 11:04:05,070 INFO [MemStoreFlusher.0]
hbase.RangerAuthorizationCoprocessor: Unable to get remote Address
2022-08-09 11:04:05,071 DEBUG [MemStoreFlusher.0] regionserver.HRegion: Flush
column family: ai of 326f48e574a65d7ba599a68dbf742603 because time of oldest
edit=1660064392583 is > 3600000 from now =1660068245071
2022-08-09 11:04:05,072 INFO [MemStoreFlusher.0] regionserver.HRegion:
Flushing 326f48e574a65d7ba599a68dbf742603 1/3 column families, dataSize=111.73
KB heapSize=184.23 KB; ai={dataSize=12.15 KB, heapSize=24.73 KB, offHeapSize=0
B}
2022-08-09 11:04:05,275 WARN
[RpcServer.default.FPBQ.Fifo.handler=42,queue=0,port=16020] ipc.RpcServer:
(responseTooSlow):
{"call":"Multi(org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$MultiRequest)","multi.gets":0,"starttimems":"1660067945231","responsesize":"1659","method":"Multi","param":"region\u003d
ns1:htable12,a8f9942ef122d7bb0d0b0036cc5d776d030f9c06b7af0ebb2908edd5a4f43b4c,1460489972562.204291859a0abda7305f73ccb03b174c.,
for 1 act
\u003cTRUNCATED\u003e","processingtimems":300043,"client":"172.16.139.217:35044","queuetimems":0,"multi.servicecalls":0,"class":"HRegionServer","multi.mutations":1}
2022-08-09 11:04:05,561 WARN
[RpcServer.default.FPBQ.Fifo.handler=14,queue=2,port=16020] ipc.RpcServer:
(responseTooSlow):
{"call":"Multi(org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$MultiRequest)","multi.gets":0,"starttimems":"1660067945519","responsesize":"1659","method":"Multi","param":"region\u003d
ns1:htable13,a00bd53d,1502528412031.f04f3634f3822a67dccf8286cfe3e19a., for 1
action(s) and 1st row key\u003da0fd50edde8a734b63a39899f4572a8
\u003cTRUNCATED\u003e","processingtimems":300042,"client":"172.16.139.215:38654","queuetimems":0,"multi.servicecalls":0,"class":"HRegionServer","multi.mutations":1}
{code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)