[ 
https://issues.apache.org/jira/browse/PHOENIX-7945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ujjawal Kumar updated PHOENIX-7945:
-----------------------------------
    Description: 
Orphaned delete markers (DeleteFamily markers without corresponding puts) are 
dropped during major compaction before Phoenix CompactionScanner can process 
them.

*Issue -* 

HBase {{DropDeletesCompactionScanQueryMatcher.tryDropDelete()}} drops delete 
markers whose timestamp < {{earliestPutTs}} when {{KeepDeletedCells=TTL}} and 
{{timeToPurgeDeletes}} is 0. Since {{earliestPutTs}} is a *global minimum 
across ALL HFiles* being compacted, a put in any other row HFile can cause 
orphaned markers to be dropped before Phoenix CompactionScanner ever sees them.
h2. Fix

Set {{timeToPurgeDeletes = Long.MAX_VALUE}} in 
{{{}setScanOptionsForFlushesAndCompactions(){}}}. This short-circuits 
{{tryDropDelete()}} so HBase never purges delete markers – Phoenix 
CompactionScanner then applies its standard max-lookback logic.


h2. Orphan Delete Marker Lifecycle with this - 

Same as normal deleted rows:
||Time Zone||Behavior||
|Within max-lookback|Retained|
|Outside max-lookback (but within TTL)|*Purged*|
|Outside TTL|Purged|

Users who need markers retained beyond max-lookback (for replication lag) can 
use the per-table max-lookback override to extend it up to TTL.

  was:
Orphaned delete markers (DeleteFamily markers without corresponding puts) are 
dropped during major compaction before Phoenix CompactionScanner can process 
them.

*Issue -* 

HBase {{DropDeletesCompactionScanQueryMatcher.tryDropDelete()}} drops delete 
markers whose timestamp < {{earliestPutTs}} when {{KeepDeletedCells=TTL}} and 
{{timeToPurgeDeletes}} is 0. Since {{earliestPutTs}} is a *global minimum 
across ALL HFiles* being compacted, a put in any other row HFile can cause 
orphaned markers to be dropped before Phoenix CompactionScanner ever sees them.
h2. Fix

Set {{timeToPurgeDeletes = Long.MAX_VALUE}} in 
{{{}setScanOptionsForFlushesAndCompactions(){}}}. This short-circuits 
{{tryDropDelete()}} so HBase never purges delete markers – Phoenix 
CompactionScanner then applies its standard max-lookback logic.


>  Retain orphaned delete markers (without puts) during Phoenix compaction
> ------------------------------------------------------------------------
>
>                 Key: PHOENIX-7945
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-7945
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: Ujjawal Kumar
>            Assignee: Ujjawal Kumar
>            Priority: Minor
>
> Orphaned delete markers (DeleteFamily markers without corresponding puts) are 
> dropped during major compaction before Phoenix CompactionScanner can process 
> them.
> *Issue -* 
> HBase {{DropDeletesCompactionScanQueryMatcher.tryDropDelete()}} drops delete 
> markers whose timestamp < {{earliestPutTs}} when {{KeepDeletedCells=TTL}} and 
> {{timeToPurgeDeletes}} is 0. Since {{earliestPutTs}} is a *global minimum 
> across ALL HFiles* being compacted, a put in any other row HFile can cause 
> orphaned markers to be dropped before Phoenix CompactionScanner ever sees 
> them.
> h2. Fix
> Set {{timeToPurgeDeletes = Long.MAX_VALUE}} in 
> {{{}setScanOptionsForFlushesAndCompactions(){}}}. This short-circuits 
> {{tryDropDelete()}} so HBase never purges delete markers – Phoenix 
> CompactionScanner then applies its standard max-lookback logic.
> h2. Orphan Delete Marker Lifecycle with this - 
> Same as normal deleted rows:
> ||Time Zone||Behavior||
> |Within max-lookback|Retained|
> |Outside max-lookback (but within TTL)|*Purged*|
> |Outside TTL|Purged|
> Users who need markers retained beyond max-lookback (for replication lag) can 
> use the per-table max-lookback override to extend it up to TTL.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to