[jira] Commented: (HADOOP-2621) Memcache flush flushing every 60 secs with out considering the max memcache size

Jim Kellerman (JIRA) Wed, 16 Jan 2008 09:04:01 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-2621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12559586#action_12559586
 ]


Jim Kellerman commented on HADOOP-2621:
---------------------------------------

There is an "optional cache flush period" which will cause the cache to be 
flushed every N seconds provided the memcache is not empty. If it is forcing 
too many cache flushes, adjusting this parameter may solve the problem.

The config parameter's name is 'hbase.regionserver.optionalcacheflushinterval' 
the default interval is 60 seconds. Units are in milliseconds.

> Memcache flush flushing every 60 secs with out considering the max memcache 
> size
> --------------------------------------------------------------------------------
>
>                 Key: HADOOP-2621
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2621
>             Project: Hadoop
>          Issue Type: Bug
>          Components: contrib/hbase
>            Reporter: Billy Pearson
>             Fix For: 0.16.0
>
>
> looks like hbase is flushing all memcache to disk every 60 secs causing a lot 
> of work for the compactor to keep up because column gets its own mapfile and 
> every region is flushed at one time. This could be a vary large number of 
> mapfiles to write if a region server is hosting 100 regions all with milti 
> columns.
> Idea memcache flush
> keep all data in memory until memcache get larger then the conf size with 
> hbase.hregion.memcache.flush.size.
> When we reach this size we should flush the regions that are the largest 
> first stopping once we drop back below the memcache max size maybe 20% below 
> the max. This will to flush only as needed as each flush takes time to 
> compact when compaction runs on a region. while we are flushing a region we 
> should also be blocking new updates from happening on that region so the 
> region server does not get over ran when a high update load hits a region 
> server. By only blocking on the region we are flushing at that time other 
> regions will still be able to do updates this.
> We we still want to use the hbase.regionserver.optionalcacheflushinterval we 
> should set to to run once an hour so something like that so we can recover 
> memory from the memcache on region that do not have a lot updates in memory. 
> But running at the default set now of 60 secs is not so good for the 
> compactor if it has many regions to handle also not good for a scanner to 
> have to scan many small files vs a few larger ones
> Example a compactor may take 15 mins to compact a region in that time we will 
> flush 15 times causeing all other regions to get a new mapfile to compact 
> when it becomes it turn to get compacted if you had many regions getting 
> compacted the last one on the list of say 10 regions would have 10 regions * 
> 15 mins each = 150 mapfiles for each column in the last region written before 
> the compactor can get to it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2621) Memcache flush flushing every 60 secs with out considering the max memcache size

Reply via email to