[jira] Commented: (HADOOP-2636) [hbase] Make cache flush triggering less simplistic

stack (JIRA) Fri, 01 Feb 2008 08:57:31 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-2636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12564825#action_12564825
 ]


stack commented on HADOOP-2636:
-------------------------------

On migration, one suggestion would be to up the version number on disk and then 
have the script check for log files.  If any found, warn user they need to 
revert their software to clean up log files.  If none found, just up the 
version number.

Also, regards my comment above where I punt the acounting of Store-level 
memcaches to another issue -- i.e. "Hard part is keeping account of all the 
memcaches in all the Stores on all the Regions on an HRS, but thats another 
issue." -- I'm now thinking thats a mistake.  My thinking is that Store-level 
memcaches will have us reporting a regionserver overloaded -- that is, that its 
memory is occupited -- when it fact its not.

When memcaches are at the Region level, a single memcache is used by all 
Stores, no matter how many families.  Making an accounting of allocated memory 
is just a case of summing Region memcaches.

When memcaches are at the Store level, accounting becomes summing of the 
memcache size times all families in the region, only in general usage, many of 
the Store memcaches won't be used at all (Presumption is that it will be 
unusual that we'll be updating all Stores during an upload outside of the 
initial upload).

A lesser, related issue is that Store-level caches will inevitably be smaller.  
A Store that is being hammered will flush lots of small files.  Lots of small 
files instead of a few big files seems to be harder on the compactor.

> [hbase] Make cache flush triggering less simplistic
> ---------------------------------------------------
>
>                 Key: HADOOP-2636
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2636
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/hbase
>    Affects Versions: 0.16.0
>            Reporter: stack
>            Assignee: Jim Kellerman
>             Fix For: 0.17.0
>
>         Attachments: patch.txt, patch.txt, patch.txt, patch.txt, patch.txt
>
>
> When flusher runs -- its triggered when the sum of all Stores in a Region > a 
> configurable max size -- we flush all Stores though a Store memcache might 
> have but a few bytes.
> I would think Stores should only dump their memcache disk if they have some 
> substance.
> The problem becomes more acute, the more families you have in a Region.
> Possible behaviors would be to dump the biggest Store only, or only those 
> Stores > 50% of max memcache size.  Behavior would vary dependent on the 
> prompt that provoked the flush.  Would also log why the flush is running: 
> optional or > max size.
> This issue comes out of HADOOP-2621.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2636) [hbase] Make cache flush triggering less simplistic

Reply via email to