[jira] [Commented] (HBASE-4071) Data GC: Remove all versions > TTL EXCEPT the last written version

Lars Hofhansl (JIRA) Tue, 16 Aug 2011 10:15:51 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13085853#comment-13085853
 ]


Lars Hofhansl commented on HBASE-4071:
--------------------------------------

Thanks Stack.

re if (currentCount <= minVersions): in this case the count was already 
incremented by the maxversions check (which is just outside of the context 
provided by the diff).

re expunging expired rows on flush: Currently it seems this is done as an 
optimization.
The reasons why believe this cannot be done if minversions>0 are: (1) The cache 
might not have all version, so I cannot count the versions to determine the 
cut-off point. (2) Even if we have minversions=1, there is no guarantee that 
the versions in the cache include the latest one (puts could have been 
backdated).
In both cases I think only at compaction time do we have enough information to 
remove expired cells (if minversions is >0).

re comments: Of course. That's why there is version control :)
I just left these in as comments to have a place where I can put a comment as 
to why I believe these are not necessary.

So you think test coverage of the existing functionality is sufficient? That is 
very good to know.
I'll add tests for the new functionality.

What's the general feeling? Should I aim for minimal intrusion or attempt to do 
a bit refactoring to abstract these policies into an interface? Leaning towards 
the latter, but on the other hand the change would be more risky.

> Data GC: Remove all versions > TTL EXCEPT the last written version
> ------------------------------------------------------------------
>
>                 Key: HBASE-4071
>                 URL: https://issues.apache.org/jira/browse/HBASE-4071
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: stack
>         Attachments: MinVersions.diff
>
>
> We were chatting today about our backup cluster.  What we want is to be able 
> to restore the dataset from any point of time but only within a limited 
> timeframe -- say one week.  Thereafter, if the versions are older than one 
> week, rather than as we do with TTL where we let go of all versions older 
> than TTL, instead, let go of all versions EXCEPT the last one written.  So, 
> its like versions==1 when TTL > one week.  We want to allow that if an error 
> is caught within a week of its happening -- user mistakenly removes a 
> critical table -- then we'll be able to restore up the the moment just before 
> catastrophe hit otherwise, we keep one version only.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4071) Data GC: Remove all versions > TTL EXCEPT the last written version

Reply via email to