[jira] Commented: (HBASE-2053) Upper bound of outstanding WALs can be overrun

stack (JIRA) Sat, 02 Jan 2010 20:11:28 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-2053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12795948#action_12795948
 ]


stack commented on HBASE-2053:
------------------------------

@j-d yes.  agreed.

> Upper bound of outstanding WALs can be overrun
> ----------------------------------------------
>
>                 Key: HBASE-2053
>                 URL: https://issues.apache.org/jira/browse/HBASE-2053
>             Project: Hadoop HBase
>          Issue Type: Improvement
>            Reporter: stack
>             Fix For: 0.21.0
>
>         Attachments: 2053-v2.patch, 2053.patch, 
> hbase-root-regionserver-server-2.log.2009-12-22.gz
>
>
> Kevin Peterson up on hbase-user posted the following.  Of interest is the 
> link on the end which is logs of WAL rolls and removals.  In once place we 
> remove 70plus logs because the outstanding edits have moved passed the 
> outstanding sequence numbers -- so our basic WAL removal mechanism is working 
> -- but if you study the log, the tendency is steady climb in the number of 
> logs.   HLog#cleanOldLogs needs to notice such an upward tendency and work 
> more aggressively cleaning the old in this case.  Here is Kevin's note:
> {code}
> n Tue, Dec 15, 2009 at 3:17 PM, Kevin Peterson <[email protected]> wrote:
> This makes some sense now. I currently have 2200 regions across 3 tables. My
> largest table accounts for about 1600 of those regions and is mostly active
> at one end of the keyspace -- our key is based on date, but data only
> roughly arrives in order. I also write to two secondary indexes, which have
> no pattern to the key at all. One of these secondary tables has 488 regions
> and the other has 96 regions.
> We write about 10M items per day to the main table (articles). All of these
> get written to one of the secondary indexes (article-ids). About a third get
> written to the other secondary index. Total volume of data is about 10GB /
> day written.
> I think the key is as you say that the regions aren't filled enough to
> flush. The articles table gets mostly written to near one end and I see
> splits happening regularly. The index tables have no pattern so the 10
> millions writes get scattered across the different regions. I've looked more
> closely at a log file (linked below), and if I forget about my main table
> (which would tend to get flushed), and look only at the indexes, this seems
> to be what's happening:
> 1. Up to maxLogs HLogs, it doesn't do any flushes.
> 2. Once it gets above maxLogs, it will start flushing one region each time
> it creates a new HLog.
> 3. If the first HLog had edits for say 50 regions, it will need to flush the
> region with oldest edits 50 times before the HLog can be removed.
> If N is the number of regions getting written to, but not getting enough
> writes to flush on their own, then I think this converges to maxLogs + N
> logs on average. If I think of maxLogs as "number of logs to start flushing
> regions at" this makes sense.
> http://kdpeterson.net/paste/hbase-hadoop-regionserver-mi-prod-app35.ec2.biz360.com.log.2009-12-14
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-2053) Upper bound of outstanding WALs can be overrun

Reply via email to