[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk

zhangduo (JIRA) Mon, 17 Nov 2014 16:42:50 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14215501#comment-14215501
 ]


zhangduo commented on HBASE-10201:
----------------------------------

{quote}
You could do something like Gaurav Menghani did (See 
https://issues.apache.org/jira/browse/HBASE-10201?focusedCommentId=14191203&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14191203)
 suggests above where rather than report on successful flush, the highest 
sequenceid of all a regions' memstores involved in a flush, instead, when you 
flush a column family only, you'd have to report one less than the oldest 
outstanding edit still alive up in a column family memstore.
{quote}
Yes, this is what the patch doing now. This is the way which has minimal impact 
on existing code.

{quote}
What if you did something much less involved; when there is pressure to flush, 
flush the stores with the oldest edits until you've freed enough memory?
{quote}

I think we need to identify the reason why we need a flush. If we need a flush 
due to large memstore size, then flush large store is enough. If we need a 
flush due to the oldest seqId alived in memstore is far away from now(which 
means we have lots of WAL that can not be archived), then we need to flush the 
store which has the oldest seqId in memstore(or maybe just flush all the 
stores? simple but useful). Maybe I can change the return value of shouldFlush 
from boolean to enum to indicate the reason why we need a flush.

> Port 'Make flush decisions per column family' to trunk
> ------------------------------------------------------
>
>                 Key: HBASE-10201
>                 URL: https://issues.apache.org/jira/browse/HBASE-10201
>             Project: HBase
>          Issue Type: Improvement
>          Components: wal
>            Reporter: Ted Yu
>            Assignee: zhangduo
>            Priority: Critical
>             Fix For: 2.0.0, 0.98.9, 0.99.2
>
>         Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, 
> HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, 
> HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_2.patch, 
> HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, 
> HBASE-10201_6.patch, HBASE-10201_7.patch
>
>
> Currently the flush decision is made using the aggregate size of all column 
> families. When large and small column families co-exist, this causes many 
> small flushes of the smaller CF. We need to make per-CF flush decisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk

Reply via email to