[jira] [Updated] (HBASE-4071) Data GC: Remove all versions > TTL EXCEPT the last written version

Lars Hofhansl (JIRA) Sun, 14 Aug 2011 23:15:07 -0700

     [ 
https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Lars Hofhansl updated HBASE-4071:
---------------------------------

    Attachment: MinVersions.diff

"A patch is worth a thousand words."

Here's an *idea* for patch.

Still getting familiar with the HBase code (so please cut me some slack, I may 
have missed an entire subsystem.)

I did some light testing both with Wildcard and Explicit Column Trackers, and 
things seem to work for me so far, including the shell.

Not too happy about the patch, though:
1. Expired rows can no longer be expunged when a Store's internal cache is 
flushed (and minversions > 0). Although I don't see how that can be avoided as 
not all versions are in the cache.
2. Makes the code harder to follow.
3. I just stubbed out the testing stuff. For now.
4. It's hardcoded (Todd won't like it :) )
5. Passes a MatchCode to ColumnTracker.checkColumn
6. The relation between TTL and Version just became more complicated.
7. Can I assume the Trackers deliver KV's in reverse time order?

Please let me know if this is off track.

Thanks...


> Data GC: Remove all versions > TTL EXCEPT the last written version
> ------------------------------------------------------------------
>
>                 Key: HBASE-4071
>                 URL: https://issues.apache.org/jira/browse/HBASE-4071
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: stack
>         Attachments: MinVersions.diff
>
>
> We were chatting today about our backup cluster.  What we want is to be able 
> to restore the dataset from any point of time but only within a limited 
> timeframe -- say one week.  Thereafter, if the versions are older than one 
> week, rather than as we do with TTL where we let go of all versions older 
> than TTL, instead, let go of all versions EXCEPT the last one written.  So, 
> its like versions==1 when TTL > one week.  We want to allow that if an error 
> is caught within a week of its happening -- user mistakenly removes a 
> critical table -- then we'll be able to restore up the the moment just before 
> catastrophe hit otherwise, we keep one version only.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4071) Data GC: Remove all versions > TTL EXCEPT the last written version

Reply via email to