[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk

stack (JIRA) Thu, 11 Dec 2014 17:17:37 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14243546#comment-14243546
 ]


stack commented on HBASE-10201:
-------------------------------

[~jeffreyz] I'm referring to the fact that if three column families, and one 
has edit #1, another edit #2 (which came later) and the third had edit #3 and 
then if the policy decides flush the third CF, we'll write it out with a seqid 
of #3 but edits #1 and #2 are still in memory. We report to the master our 
lowest number is #1 but master crashes (so we lose info that #1 is earliest 
safe edit number).  The RS hosting the three column famiilies also crashes.  On 
recovery, we open the region and see a hfile with seqid #3 so we set the region 
current seqid to #4.. even though #1 and #2 were never persisted.  This is 
possible with this patch as is especially when policy is disconnected from 
flush.

bq. We need to pass flushed seqIds per store to master so that we can optimize 
recovery process but doesn't impact correctness.

This would not fix the above case?  The master might know that #3 was persisted 
and that column family 1 and 2 had edits less than #3 but if it crashes, we're 
back in the scenario described above (unless we persist the flush reports?)

Thanks.


> Port 'Make flush decisions per column family' to trunk
> ------------------------------------------------------
>
>                 Key: HBASE-10201
>                 URL: https://issues.apache.org/jira/browse/HBASE-10201
>             Project: HBase
>          Issue Type: Improvement
>          Components: wal
>            Reporter: Ted Yu
>            Assignee: zhangduo
>             Fix For: 1.0.0, 2.0.0
>
>         Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, 
> HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, 
> HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, 
> HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, 
> HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, 
> HBASE-10201_16.patch, HBASE-10201_17.patch, HBASE-10201_18.patch, 
> HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, 
> HBASE-10201_5.patch, HBASE-10201_6.patch, HBASE-10201_7.patch, 
> HBASE-10201_8.patch, HBASE-10201_9.patch, compactions.png, count.png, io.png, 
> memstore.png
>
>
> Currently the flush decision is made using the aggregate size of all column 
> families. When large and small column families co-exist, this causes many 
> small flushes of the smaller CF. We need to make per-CF flush decisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk

Reply via email to