[
https://issues.apache.org/jira/browse/HBASE-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13854725#comment-13854725
]
Gaurav Menghani commented on HBASE-3149:
----------------------------------------
[[email protected]] Yes, we have deployed this, with selective flushing
disabled for now, since we didn't see any aggregate benefits yet. The
heuristics that I was thinking about were around, which column families to
flush when there are no column families above the threshold for flushing
families. Eg. if the memstore limit is 128 MB, and the flushing threshold for a
CF is 32 MB, there might be a case, where there are like 7-8 CFs, and none of
them are above 32 MB.
In that case, there are a couple of heuristics you can choose. Like: flush the
top N column families, flush only as few column families to free up 1/4 th of
the memstore, etc. The main benefit I see is the time spent while compacting
the smaller CFs will be much lesser, since the number of files created would be
much lesser. This is compensated against bigger column families being flushed
earlier than before, and having smaller files than without this change, but
with the right heuristics we can find a good balance.
> Make flush decisions per column family
> --------------------------------------
>
> Key: HBASE-3149
> URL: https://issues.apache.org/jira/browse/HBASE-3149
> Project: HBase
> Issue Type: Improvement
> Components: regionserver
> Reporter: Karthik Ranganathan
> Assignee: Gaurav Menghani
> Priority: Critical
> Fix For: 0.89-fb
>
> Attachments: 3149-trunk-v1.txt, Per-CF-Memstore-Flush.diff
>
>
> Today, the flush decision is made using the aggregate size of all column
> families. When large and small column families co-exist, this causes many
> small flushes of the smaller CF. We need to make per-CF flush decisions.
--
This message was sent by Atlassian JIRA
(v6.1.4#6159)