[jira] Commented: (HBASE-2236) Upper bound of outstanding WALs can be overrun; take 2 (take 1 was hbase-2053)

stack (JIRA) Fri, 11 Jun 2010 21:47:40 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-2236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12878211#action_12878211
 ]


stack commented on HBASE-2236:
------------------------------

This bug is already marked critical.  I think it is.  Was at a user site today 
where hdfs was yanked from under hbase.  Each server had 15-20 logs to 
proces... some even more.  We need to be more aggressive about cleaning up old 
logs.  

> Upper bound of outstanding WALs can be overrun; take 2 (take 1 was hbase-2053)
> ------------------------------------------------------------------------------
>
>                 Key: HBASE-2236
>                 URL: https://issues.apache.org/jira/browse/HBASE-2236
>             Project: HBase
>          Issue Type: Bug
>            Reporter: stack
>            Priority: Critical
>             Fix For: 0.21.0
>
>
> So hbase-2053 is not aggressive enough.  WALs can still overwhelm the upper 
> limit on log count.  While the code added by HBASE-2053, when done, will 
> ensure we let go of the oldest WAL, to do it, we might have to flush many 
> regions.  E.g:
> {code}
> 2010-02-15 14:20:29,351 INFO org.apache.hadoop.hbase.regionserver.HLog: Too 
> many hlogs: logs=45, maxlogs=32; forcing flush of 5 regions(s): 
> test1,193717,1266095474624, test1,194375,1266108228663, 
> test1,195690,1266095539377, test1,196348,1266095539377, 
> test1,197939,1266069173999
> {code}
> This takes time.  If we are taking on edits a furious rate, we might have 
> rolled the log again, meantime, maybe more than once.
> Also log rolls happen inline with a put/delete as soon as it hits the 64MB 
> (default) boundary whereas the necessary flushing is done in background by a 
> single thread and the memstore can overrun the (default) 64MB size.  Flushes 
> needed to release logs will be mixed in with "natural" flushes as memstores 
> fill.  Flushes may take longer than the writing of an HLog because they can 
> be larger.
> So, on an RS that is struggling the tendency would seem to be for a slight 
> rise in WALs.  Only if the RS gets a breather will the flusher catch up.
> If HBASE-2087 happens, then the count of WALs get a boost.
> Ideas to fix this for good would be :
> + Priority queue for queuing up flushes with those that are queued to free up 
> WALs having priority
> + Improve the HBASE-2053 code so that it will free more than just the last 
> WAL, maybe even queuing flushes so we clear all WALs such that we are back 
> under the maximum WALS threshold again.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-2236) Upper bound of outstanding WALs can be overrun; take 2 (take 1 was hbase-2053)

Reply via email to