[
https://issues.apache.org/jira/browse/PHOENIX-5645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Geoffrey Jacoby updated PHOENIX-5645:
-------------------------------------
Description:
Phoenix's SCN feature has some problems, because HBase major compaction can
remove Cells that have been deleted or whose TTL or max versions has caused
them to be expired.
For example, IndexTool rebuilds and index scrutiny can both give strange,
incorrect results if a major compaction occurs in the middle of their run. In
the rebuild case, it's because we're rewriting "history" on the index at the
same time that compaction is rewriting "history" by purging deleted and expired
cells.
Create a new configuration property called "max lookback age", which declares
that no data written more recently than the max lookback age will be compacted
away. The max lookback age must be smaller than the TTL, and it should not be
legal for a user to look back further in the past than the table's TTL.
Max lookback age by default will not be set, and the current behavior will be
preserved. But if max lookback age is set, it will be enforced by the
BaseScannerRegionObserver for all tables.
In the future, this should be contributed as a general feature to HBase for
arbitrary tables. See HBASE-23602.
was:
IndexTool rebuilds and index scrutiny can both give strange, incorrect results
if a major compaction occurs in the middle of their run. In the rebuild case,
it's because we're rewriting "history" on the index at the same time that
compaction is rewriting "history" by purging deleted and expired cells.
In the case of scrutiny, it's because it does an SCN-based lookback, and if
versions are purged on the index before their equivalent data table rows, you
can get false errors.
Since in the new indexing path we already have a coprocessor on each index, it
should override the compaction hook to shield rows newer than some configurable
age from being purged during a major compaction.
In the future, this should be contributed as a general feature to HBase for
arbitrary tables.
Summary: BaseScannerRegionObserver should prevent compaction from
purging very recently deleted cells (was: GlobalIndexChecker should prevent
compaction from purging very recently deleted cells)
> BaseScannerRegionObserver should prevent compaction from purging very
> recently deleted cells
> --------------------------------------------------------------------------------------------
>
> Key: PHOENIX-5645
> URL: https://issues.apache.org/jira/browse/PHOENIX-5645
> Project: Phoenix
> Issue Type: Improvement
> Reporter: Geoffrey Jacoby
> Assignee: Geoffrey Jacoby
> Priority: Major
> Attachments: PHOENIX-5645-4.x-HBase-1.5-v2.patch,
> PHOENIX-5645-4.x-HBase-1.5.patch, PHOENIX-5645-4.x-HBase-1.5.v3.patch
>
> Time Spent: 5h 40m
> Remaining Estimate: 0h
>
> Phoenix's SCN feature has some problems, because HBase major compaction can
> remove Cells that have been deleted or whose TTL or max versions has caused
> them to be expired.
> For example, IndexTool rebuilds and index scrutiny can both give strange,
> incorrect results if a major compaction occurs in the middle of their run. In
> the rebuild case, it's because we're rewriting "history" on the index at the
> same time that compaction is rewriting "history" by purging deleted and
> expired cells.
> Create a new configuration property called "max lookback age", which declares
> that no data written more recently than the max lookback age will be
> compacted away. The max lookback age must be smaller than the TTL, and it
> should not be legal for a user to look back further in the past than the
> table's TTL.
> Max lookback age by default will not be set, and the current behavior will be
> preserved. But if max lookback age is set, it will be enforced by the
> BaseScannerRegionObserver for all tables.
> In the future, this should be contributed as a general feature to HBase for
> arbitrary tables. See HBASE-23602.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)