[ https://issues.apache.org/jira/browse/PHOENIX-5645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Geoffrey Jacoby updated PHOENIX-5645: ------------------------------------- Description: Phoenix's SCN feature has some problems, because HBase major compaction can remove Cells that have been deleted or whose TTL or max versions has caused them to be expired. For example, IndexTool rebuilds and index scrutiny can both give strange, incorrect results if a major compaction occurs in the middle of their run. In the rebuild case, it's because we're rewriting "history" on the index at the same time that compaction is rewriting "history" by purging deleted and expired cells. Create a new configuration property called "max lookback age", which declares that no data written more recently than the max lookback age will be compacted away. The max lookback age must be smaller than the TTL, and it should not be legal for a user to look back further in the past than the table's TTL. Max lookback age by default will not be set, and the current behavior will be preserved. But if max lookback age is set, it will be enforced by the BaseScannerRegionObserver for all tables. In the future, this should be contributed as a general feature to HBase for arbitrary tables. See HBASE-23602. was: IndexTool rebuilds and index scrutiny can both give strange, incorrect results if a major compaction occurs in the middle of their run. In the rebuild case, it's because we're rewriting "history" on the index at the same time that compaction is rewriting "history" by purging deleted and expired cells. In the case of scrutiny, it's because it does an SCN-based lookback, and if versions are purged on the index before their equivalent data table rows, you can get false errors. Since in the new indexing path we already have a coprocessor on each index, it should override the compaction hook to shield rows newer than some configurable age from being purged during a major compaction. In the future, this should be contributed as a general feature to HBase for arbitrary tables. Summary: BaseScannerRegionObserver should prevent compaction from purging very recently deleted cells (was: GlobalIndexChecker should prevent compaction from purging very recently deleted cells) > BaseScannerRegionObserver should prevent compaction from purging very > recently deleted cells > -------------------------------------------------------------------------------------------- > > Key: PHOENIX-5645 > URL: https://issues.apache.org/jira/browse/PHOENIX-5645 > Project: Phoenix > Issue Type: Improvement > Reporter: Geoffrey Jacoby > Assignee: Geoffrey Jacoby > Priority: Major > Attachments: PHOENIX-5645-4.x-HBase-1.5-v2.patch, > PHOENIX-5645-4.x-HBase-1.5.patch, PHOENIX-5645-4.x-HBase-1.5.v3.patch > > Time Spent: 5h 40m > Remaining Estimate: 0h > > Phoenix's SCN feature has some problems, because HBase major compaction can > remove Cells that have been deleted or whose TTL or max versions has caused > them to be expired. > For example, IndexTool rebuilds and index scrutiny can both give strange, > incorrect results if a major compaction occurs in the middle of their run. In > the rebuild case, it's because we're rewriting "history" on the index at the > same time that compaction is rewriting "history" by purging deleted and > expired cells. > Create a new configuration property called "max lookback age", which declares > that no data written more recently than the max lookback age will be > compacted away. The max lookback age must be smaller than the TTL, and it > should not be legal for a user to look back further in the past than the > table's TTL. > Max lookback age by default will not be set, and the current behavior will be > preserved. But if max lookback age is set, it will be enforced by the > BaseScannerRegionObserver for all tables. > In the future, this should be contributed as a general feature to HBase for > arbitrary tables. See HBASE-23602. -- This message was sent by Atlassian Jira (v8.3.4#803005)