[
https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14215501#comment-14215501
]
zhangduo commented on HBASE-10201:
----------------------------------
{quote}
You could do something like Gaurav Menghani did (See
https://issues.apache.org/jira/browse/HBASE-10201?focusedCommentId=14191203&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14191203)
suggests above where rather than report on successful flush, the highest
sequenceid of all a regions' memstores involved in a flush, instead, when you
flush a column family only, you'd have to report one less than the oldest
outstanding edit still alive up in a column family memstore.
{quote}
Yes, this is what the patch doing now. This is the way which has minimal impact
on existing code.
{quote}
What if you did something much less involved; when there is pressure to flush,
flush the stores with the oldest edits until you've freed enough memory?
{quote}
I think we need to identify the reason why we need a flush. If we need a flush
due to large memstore size, then flush large store is enough. If we need a
flush due to the oldest seqId alived in memstore is far away from now(which
means we have lots of WAL that can not be archived), then we need to flush the
store which has the oldest seqId in memstore(or maybe just flush all the
stores? simple but useful). Maybe I can change the return value of shouldFlush
from boolean to enum to indicate the reason why we need a flush.
> Port 'Make flush decisions per column family' to trunk
> ------------------------------------------------------
>
> Key: HBASE-10201
> URL: https://issues.apache.org/jira/browse/HBASE-10201
> Project: HBase
> Issue Type: Improvement
> Components: wal
> Reporter: Ted Yu
> Assignee: zhangduo
> Priority: Critical
> Fix For: 2.0.0, 0.98.9, 0.99.2
>
> Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch,
> HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch,
> HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_2.patch,
> HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch,
> HBASE-10201_6.patch, HBASE-10201_7.patch
>
>
> Currently the flush decision is made using the aggregate size of all column
> families. When large and small column families co-exist, this causes many
> small flushes of the smaller CF. We need to make per-CF flush decisions.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)