[ 
https://issues.apache.org/jira/browse/PHOENIX-5645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Geoffrey Jacoby updated PHOENIX-5645:
-------------------------------------
    Description: 
Phoenix's SCN feature has some problems, because HBase major compaction can 
remove Cells that have been deleted or whose TTL or max versions has caused 
them to be expired. 

For example, IndexTool rebuilds and index scrutiny can both give strange, 
incorrect results if a major compaction occurs in the middle of their run. In 
the rebuild case, it's because we're rewriting "history" on the index at the 
same time that compaction is rewriting "history" by purging deleted and expired 
cells. 

Create a new configuration property called "max lookback age", which declares 
that no data written more recently than the max lookback age will be compacted 
away. The max lookback age must be smaller than the TTL, and it should not be 
legal for a user to look back further in the past than the table's TTL. 

Max lookback age by default will not be set, and the current behavior will be 
preserved. But if max lookback age is set, it will be enforced by the 
BaseScannerRegionObserver for all tables. 

In the future, this should be contributed as a general feature to HBase for 
arbitrary tables. See HBASE-23602.

  was:
IndexTool rebuilds and index scrutiny can both give strange, incorrect results 
if a major compaction occurs in the middle of their run. In the rebuild case, 
it's because we're rewriting "history" on the index at the same time that 
compaction is rewriting "history" by purging deleted and expired cells. 

In the case of scrutiny, it's because it does an SCN-based lookback, and if 
versions are purged on the index before their equivalent data table rows, you 
can get false errors. 

Since in the new indexing path we already have a coprocessor on each index, it 
should override the compaction hook to shield rows newer than some configurable 
age from being purged during a major compaction.

In the future, this should be contributed as a general feature to HBase for 
arbitrary tables. 

        Summary: BaseScannerRegionObserver should prevent compaction from 
purging very recently deleted cells  (was: GlobalIndexChecker should prevent 
compaction from purging very recently deleted cells)

> BaseScannerRegionObserver should prevent compaction from purging very 
> recently deleted cells
> --------------------------------------------------------------------------------------------
>
>                 Key: PHOENIX-5645
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-5645
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: Geoffrey Jacoby
>            Assignee: Geoffrey Jacoby
>            Priority: Major
>         Attachments: PHOENIX-5645-4.x-HBase-1.5-v2.patch, 
> PHOENIX-5645-4.x-HBase-1.5.patch, PHOENIX-5645-4.x-HBase-1.5.v3.patch
>
>          Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> Phoenix's SCN feature has some problems, because HBase major compaction can 
> remove Cells that have been deleted or whose TTL or max versions has caused 
> them to be expired. 
> For example, IndexTool rebuilds and index scrutiny can both give strange, 
> incorrect results if a major compaction occurs in the middle of their run. In 
> the rebuild case, it's because we're rewriting "history" on the index at the 
> same time that compaction is rewriting "history" by purging deleted and 
> expired cells. 
> Create a new configuration property called "max lookback age", which declares 
> that no data written more recently than the max lookback age will be 
> compacted away. The max lookback age must be smaller than the TTL, and it 
> should not be legal for a user to look back further in the past than the 
> table's TTL. 
> Max lookback age by default will not be set, and the current behavior will be 
> preserved. But if max lookback age is set, it will be enforced by the 
> BaseScannerRegionObserver for all tables. 
> In the future, this should be contributed as a general feature to HBase for 
> arbitrary tables. See HBASE-23602.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to