[jira] [Commented] (HBASE-23181) Blocked WAL archive: "LogRoller: Failed to schedule flush of 8ee433ad59526778c53cc85ed3762d0b, because it is not online on us"

Guangxu Cheng (Jira) Wed, 16 Oct 2019 21:43:28 -0700


    [ 
https://issues.apache.org/jira/browse/HBASE-23181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16953399#comment-16953399
 ]


Guangxu Cheng commented on HBASE-23181:
---------------------------------------

Write data using ASYNC_WAL？If so, it may be related to HBASE-23157?

> Blocked WAL archive: "LogRoller: Failed to schedule flush of 
> 8ee433ad59526778c53cc85ed3762d0b, because it is not online on us"
> ------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-23181
>                 URL: https://issues.apache.org/jira/browse/HBASE-23181
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Michael Stack
>            Priority: Major
>
> On a heavily loaded cluster, WAL count keeps rising and we can get into a 
> state where we are not rolling the logs off fast enough. In particular, there 
> is this interesting state at the extreme where we pick a region to flush 
> because 'Too many WALs' but the region is actually not online. As the WAL 
> count rises, we keep picking a region-to-flush that is no longer on the 
> server. This condition blocks our being able to clear WALs; eventually WALs 
> climb into the hundreds and the RS goes zombie with a full Call queue that 
> starts throwing CallQueueTooLargeExceptions (bad if this servers is the one 
> carrying hbase:meta).
> Here is how it looks in the log:
> {code}
> # Here is region closing....
> 2019-10-16 23:10:55,897 INFO 
> org.apache.hadoop.hbase.regionserver.handler.UnassignRegionHandler: Closed 
> 8ee433ad59526778c53cc85ed3762d0b
> ....
> # Then soon after ...
> 2019-10-16 23:11:44,041 WARN org.apache.hadoop.hbase.regionserver.LogRoller: 
> Failed to schedule flush of 8ee433ad59526778c53cc85ed3762d0b, because it is 
> not online on us
> 2019-10-16 23:11:45,006 INFO 
> org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL: Too many WALs; 
> count=45, max=32; forcing flush of 1 regions(s): 
> 8ee433ad59526778c53cc85ed3762d0b
> ...
> # Later...
> 2019-10-16 23:20:25,427 INFO 
> org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL: Too many WALs; 
> count=542, max=32; forcing flush of 1 regions(s): 
> 8ee433ad59526778c53cc85ed3762d0b
> 2019-10-16 23:20:25,427 WARN org.apache.hadoop.hbase.regionserver.LogRoller: 
> Failed to schedule flush of 8ee433ad59526778c53cc85ed3762d0b, because it is 
> not online on us
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HBASE-23181) Blocked WAL archive: "LogRoller: Failed to schedule flush of 8ee433ad59526778c53cc85ed3762d0b, because it is not online on us"

Reply via email to