[
https://issues.apache.org/jira/browse/HBASE-2236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12878211#action_12878211
]
stack commented on HBASE-2236:
------------------------------
This bug is already marked critical. I think it is. Was at a user site today
where hdfs was yanked from under hbase. Each server had 15-20 logs to
proces... some even more. We need to be more aggressive about cleaning up old
logs.
> Upper bound of outstanding WALs can be overrun; take 2 (take 1 was hbase-2053)
> ------------------------------------------------------------------------------
>
> Key: HBASE-2236
> URL: https://issues.apache.org/jira/browse/HBASE-2236
> Project: HBase
> Issue Type: Bug
> Reporter: stack
> Priority: Critical
> Fix For: 0.21.0
>
>
> So hbase-2053 is not aggressive enough. WALs can still overwhelm the upper
> limit on log count. While the code added by HBASE-2053, when done, will
> ensure we let go of the oldest WAL, to do it, we might have to flush many
> regions. E.g:
> {code}
> 2010-02-15 14:20:29,351 INFO org.apache.hadoop.hbase.regionserver.HLog: Too
> many hlogs: logs=45, maxlogs=32; forcing flush of 5 regions(s):
> test1,193717,1266095474624, test1,194375,1266108228663,
> test1,195690,1266095539377, test1,196348,1266095539377,
> test1,197939,1266069173999
> {code}
> This takes time. If we are taking on edits a furious rate, we might have
> rolled the log again, meantime, maybe more than once.
> Also log rolls happen inline with a put/delete as soon as it hits the 64MB
> (default) boundary whereas the necessary flushing is done in background by a
> single thread and the memstore can overrun the (default) 64MB size. Flushes
> needed to release logs will be mixed in with "natural" flushes as memstores
> fill. Flushes may take longer than the writing of an HLog because they can
> be larger.
> So, on an RS that is struggling the tendency would seem to be for a slight
> rise in WALs. Only if the RS gets a breather will the flusher catch up.
> If HBASE-2087 happens, then the count of WALs get a boost.
> Ideas to fix this for good would be :
> + Priority queue for queuing up flushes with those that are queued to free up
> WALs having priority
> + Improve the HBASE-2053 code so that it will free more than just the last
> WAL, maybe even queuing flushes so we clear all WALs such that we are back
> under the maximum WALS threshold again.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.