[
https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13085853#comment-13085853
]
Lars Hofhansl commented on HBASE-4071:
--------------------------------------
Thanks Stack.
re if (currentCount <= minVersions): in this case the count was already
incremented by the maxversions check (which is just outside of the context
provided by the diff).
re expunging expired rows on flush: Currently it seems this is done as an
optimization.
The reasons why believe this cannot be done if minversions>0 are: (1) The cache
might not have all version, so I cannot count the versions to determine the
cut-off point. (2) Even if we have minversions=1, there is no guarantee that
the versions in the cache include the latest one (puts could have been
backdated).
In both cases I think only at compaction time do we have enough information to
remove expired cells (if minversions is >0).
re comments: Of course. That's why there is version control :)
I just left these in as comments to have a place where I can put a comment as
to why I believe these are not necessary.
So you think test coverage of the existing functionality is sufficient? That is
very good to know.
I'll add tests for the new functionality.
What's the general feeling? Should I aim for minimal intrusion or attempt to do
a bit refactoring to abstract these policies into an interface? Leaning towards
the latter, but on the other hand the change would be more risky.
> Data GC: Remove all versions > TTL EXCEPT the last written version
> ------------------------------------------------------------------
>
> Key: HBASE-4071
> URL: https://issues.apache.org/jira/browse/HBASE-4071
> Project: HBase
> Issue Type: New Feature
> Reporter: stack
> Attachments: MinVersions.diff
>
>
> We were chatting today about our backup cluster. What we want is to be able
> to restore the dataset from any point of time but only within a limited
> timeframe -- say one week. Thereafter, if the versions are older than one
> week, rather than as we do with TTL where we let go of all versions older
> than TTL, instead, let go of all versions EXCEPT the last one written. So,
> its like versions==1 when TTL > one week. We want to allow that if an error
> is caught within a week of its happening -- user mistakenly removes a
> critical table -- then we'll be able to restore up the the moment just before
> catastrophe hit otherwise, we keep one version only.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira