[jira] [Commented] (HBASE-3149) Make flush decisions per column family

Gaurav Menghani (JIRA) Fri, 20 Dec 2013 17:17:39 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13854725#comment-13854725
 ]


Gaurav Menghani commented on HBASE-3149:
----------------------------------------

[[email protected]] Yes, we have deployed this, with selective flushing 
disabled for now, since we didn't see any aggregate benefits yet. The 
heuristics that I was thinking about were around, which column families to 
flush when there are no column families above the threshold for flushing 
families. Eg. if the memstore limit is 128 MB, and the flushing threshold for a 
CF is 32 MB, there might be a case, where there are like 7-8 CFs, and none of 
them are above 32 MB. 

In that case, there are a couple of heuristics you can choose. Like: flush the 
top N column families, flush only as few column families to free up 1/4 th of 
the memstore, etc. The main benefit I see is the time spent while compacting 
the smaller CFs will be much lesser, since the number of files created would be 
much lesser. This is compensated against bigger column families being flushed 
earlier than before, and having smaller files than without this change, but 
with the right heuristics we can find a good balance.

> Make flush decisions per column family
> --------------------------------------
>
>                 Key: HBASE-3149
>                 URL: https://issues.apache.org/jira/browse/HBASE-3149
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>            Reporter: Karthik Ranganathan
>            Assignee: Gaurav Menghani
>            Priority: Critical
>             Fix For: 0.89-fb
>
>         Attachments: 3149-trunk-v1.txt, Per-CF-Memstore-Flush.diff
>
>
> Today, the flush decision is made using the aggregate size of all column 
> families. When large and small column families co-exist, this causes many 
> small flushes of the smaller CF. We need to make per-CF flush decisions.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (HBASE-3149) Make flush decisions per column family

Reply via email to