[
https://issues.apache.org/jira/browse/HBASE-23602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17011308#comment-17011308
]
Geoffrey Jacoby commented on HBASE-23602:
-----------------------------------------
[~larsh] - thanks for pointing me to 12363 -- that's useful background. But
KEEP_DELETED_CELLS=TTL doesn't quite get me what I'm looking for, I don't
think.
I want to configure a moving window of time in which HBase's history is
complete and unaltered.
Calling this work item a TTL is probably misleading -- minimum purge time
maybe? A similar Phoenix feature I'm writing calls it "max lookback age".
Regardless of TTL, VERSIONS, delete markers, or whatever, I want to be able to
set an age where no Put or Delete younger than that age is removed by major
compaction.
Say I have a min purge time of 5 days, and I have a table with no TTL and
VERSIONS => 2. I Put a Cell with key R at time T1 and then replace it twice at
T2 and T3 with other Cells of key R, where Now - T1 < 5 days. A raw Scan, or a
Scan with an appropriate max time should be able to see the earlier edits
_regardless of what flushes or compactions have run_ so long as it's done
within the window.
Thanks to HBASE-10118, we already have a setting that does this for delete
markers -- I believe this is the same concept, just for all Mutations. Over at
PHOENIX-5645 I'm trying to accomplish the same thing using some messy
coprocessor magic, but it would be better to have a clean HBase-level
abstraction.
> TTL Before Which No Data is Purged
> ----------------------------------
>
> Key: HBASE-23602
> URL: https://issues.apache.org/jira/browse/HBASE-23602
> Project: HBase
> Issue Type: New Feature
> Reporter: Geoffrey Jacoby
> Assignee: Geoffrey Jacoby
> Priority: Major
> Fix For: 3.0.0, 2.3.0, 1.6.0
>
>
> HBase currently offers operators a choice. They can set
> KEEP_DELETED_CELLS=true and VERSIONS to max value, plus no TTL, and they will
> always have a complete history of all changes (but high storage costs and
> penalties to read performance). Or they can have KEEP_DELETED_CELLS=false and
> VERSIONS/TTL set to some reasonable values, but that means that major
> compactions can destroy the ability to do a consistent snapshot read of any
> prior time. (This limits the usefulness and correctness of, for example,
> Phoenix's SCN lookback feature.)
> I propose having a new TTL property to give a minimum age that an expired or
> deleted Cell would have to achieve before it could be purged. (I see that
> HBASE-10118 already does something similar for the delete markers
> themselves.)
> This would allow operators to have a consistent history for some finite
> amount of recent time while still purging out the "long tail" of obsolete /
> deleted versions.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)