Geoffrey Jacoby created HBASE-23766:
---------------------------------------
Summary: Support Point-In-Time Queries
Key: HBASE-23766
URL: https://issues.apache.org/jira/browse/HBASE-23766
Project: HBase
Issue Type: New Feature
Reporter: Geoffrey Jacoby
Assignee: Geoffrey Jacoby
HBase currently offers a snapshot feature which allows operators to capture the
state of a table at a point in time in a way that can be cloned or queried in
the future. It's quite useful in some circumstances, but limited because it's a
heavyweight operation, and because it requires prior knowledge of the time you
want to capture.
Phoenix currently offers a feature called "SCN", which uses the max timestamp
on Scans to provide the illusion of a "lookback" query at a point in time. It's
imperfect, however, because of HBase's filtering and cleanup logic for deletes,
max versions and TTLs can prevent users from seeing certain Cells they would
have been able to see at a previous point in time. Even PHOENIX-5645, and the
equivalent HBASE-23602, which try to control major compaction cleanup, don't
cover all edge cases completely. (For example, you can't see rows whose TTL has
expired now but hadn't back then. Same with max versions.)
There are useful non-Phoenix applications as well, such as a change stream that
shows before/after images, as DynamoDB offers.
Since full support will require new configuration options added not just to
major compaction, but also to the read pipeline, I'm filing this as an umbrella
JIRA so we can have smaller sub-tasks, rather than trying to cram everything
into HBASE-23602.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)