March Wang created HBASE-27413:
----------------------------------

             Summary: MemStoreFlusher: Cache flush failed for region
                 Key: HBASE-27413
                 URL: https://issues.apache.org/jira/browse/HBASE-27413
             Project: HBase
          Issue Type: Bug
    Affects Versions: 2.2.6
         Environment: Hadoop 3.3.1

HBase 2.2.6

jdk: openjdk version "1.8.0_342"

 
            Reporter: March Wang


We are running an HBase cluster (Hadoop 3.3.1 + HBase 2.2.6) when clients 
frequently write a lot of data into this HBase cluster, we got the message 
"Cache flush failed for the region" error log. then got a lot of 
"responseTooSlow" logs. below are the sample logs of the "Cache flush failed 
for region" messages:
{code:java}
022-08-09 11:04:05,069 DEBUG [MemStoreFlusher.0] regionserver.HRegion: Flush 
status journal:
    Acquiring readlock on region at 1660067945028
    Running coprocessor pre-flush hooks at 1660067945028
    Obtaining lock to block concurrent updates at 1660067945028
    Preparing flush snapshotting stores in dfab869db13603b7800cf7b33d58a885 at 
1660067945028
    Finished memstore snapshotting 
ns1:htable1,77e1422d7b070b9ee3ac9466174655e884d41d87bbe9182e05661cf11d02298b,1653473214632.dfab869db13603b7800cf7b33d58a885.,
 syncing WAL and waiting on mvcc, flushsize=dataSize=28417, getHeapSize=59360, 
getOffHeapSize=0, getCellsCount=284 at 1660067945028
2022-08-09 11:04:05,070 ERROR [MemStoreFlusher.0] regionserver.MemStoreFlusher: 
Cache flush failed for region 
ns1:htable1,77e1422d7b070b9ee3ac9466174655e884d41d87bbe9182e05661cf11d02298b,1653473214632.dfab869db13603b7800cf7b33d58a885.
org.apache.hadoop.hbase.exceptions.TimeoutIOException: Failed to get sync 
result after 300000 ms for txid=77949378, WAL system stuck?
    at 
org.apache.hadoop.hbase.regionserver.wal.SyncFuture.get(SyncFuture.java:145)
    at 
org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.blockOnSync(AbstractFSWAL.java:722)
    at 
org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.sync(AsyncFSWAL.java:616)
    at 
org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.sync(AsyncFSWAL.java:593)
    at 
org.apache.hadoop.hbase.regionserver.HRegion.doSyncOfUnflushedWALChanges(HRegion.java:2754)
    at 
org.apache.hadoop.hbase.regionserver.HRegion.internalPrepareFlushCache(HRegion.java:2691)
    at 
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2549)
    at 
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2523)
    at 
org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2413)
    at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:611)
    at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:580)
    at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$1000(MemStoreFlusher.java:68)
    at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:360)
    at java.lang.Thread.run(Thread.java:748)
2022-08-09 11:04:05,070 INFO  [MemStoreFlusher.0] 
hbase.RangerAuthorizationCoprocessor: Unable to get remote Address
2022-08-09 11:04:05,071 DEBUG [MemStoreFlusher.0] regionserver.HRegion: Flush 
column family: ai of 326f48e574a65d7ba599a68dbf742603 because time of oldest 
edit=1660064392583 is > 3600000 from now =1660068245071
2022-08-09 11:04:05,072 INFO  [MemStoreFlusher.0] regionserver.HRegion: 
Flushing 326f48e574a65d7ba599a68dbf742603 1/3 column families, dataSize=111.73 
KB heapSize=184.23 KB; ai={dataSize=12.15 KB, heapSize=24.73 KB, offHeapSize=0 
B}
2022-08-09 11:04:05,275 WARN  
[RpcServer.default.FPBQ.Fifo.handler=42,queue=0,port=16020] ipc.RpcServer: 
(responseTooSlow): 
{"call":"Multi(org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$MultiRequest)","multi.gets":0,"starttimems":"1660067945231","responsesize":"1659","method":"Multi","param":"region\u003d
 
ns1:htable12,a8f9942ef122d7bb0d0b0036cc5d776d030f9c06b7af0ebb2908edd5a4f43b4c,1460489972562.204291859a0abda7305f73ccb03b174c.,
 for 1 act 
\u003cTRUNCATED\u003e","processingtimems":300043,"client":"172.16.139.217:35044","queuetimems":0,"multi.servicecalls":0,"class":"HRegionServer","multi.mutations":1}
2022-08-09 11:04:05,561 WARN  
[RpcServer.default.FPBQ.Fifo.handler=14,queue=2,port=16020] ipc.RpcServer: 
(responseTooSlow): 
{"call":"Multi(org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$MultiRequest)","multi.gets":0,"starttimems":"1660067945519","responsesize":"1659","method":"Multi","param":"region\u003d
 ns1:htable13,a00bd53d,1502528412031.f04f3634f3822a67dccf8286cfe3e19a., for 1 
action(s) and 1st row key\u003da0fd50edde8a734b63a39899f4572a8 
\u003cTRUNCATED\u003e","processingtimems":300042,"client":"172.16.139.215:38654","queuetimems":0,"multi.servicecalls":0,"class":"HRegionServer","multi.mutations":1}
 {code}
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to