[
https://issues.apache.org/jira/browse/PHOENIX-7945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ujjawal Kumar updated PHOENIX-7945:
-----------------------------------
Description:
Orphaned delete markers (DeleteFamily markers without corresponding puts) are
dropped during major compaction before Phoenix CompactionScanner can process
them.
*Issue -*
HBase {{DropDeletesCompactionScanQueryMatcher.tryDropDelete()}} drops delete
markers whose timestamp < {{earliestPutTs}} when {{KeepDeletedCells=TTL}} and
{{timeToPurgeDeletes}} is 0. Since {{earliestPutTs}} is a *global minimum
across ALL HFiles* being compacted, a put in any other row HFile can cause
orphaned markers to be dropped before Phoenix CompactionScanner ever sees them.
h2. Fix
Set {{timeToPurgeDeletes = Long.MAX_VALUE}} in
{{{}setScanOptionsForFlushesAndCompactions(){}}}. This short-circuits
{{tryDropDelete()}} so HBase never purges delete markers – Phoenix
CompactionScanner then applies its standard max-lookback logic.
h2. Orphan Delete Marker Lifecycle with this -
Same as normal deleted rows:
||Time Zone||Behavior||
|Within max-lookback|Retained|
|Outside max-lookback (but within TTL)|*Purged*|
|Outside TTL|Purged|
Users who need markers retained beyond max-lookback (for replication lag cases)
can use the per-table max-lookback override to extend it up to TTL.
was:
Orphaned delete markers (DeleteFamily markers without corresponding puts) are
dropped during major compaction before Phoenix CompactionScanner can process
them.
*Issue -*
HBase {{DropDeletesCompactionScanQueryMatcher.tryDropDelete()}} drops delete
markers whose timestamp < {{earliestPutTs}} when {{KeepDeletedCells=TTL}} and
{{timeToPurgeDeletes}} is 0. Since {{earliestPutTs}} is a *global minimum
across ALL HFiles* being compacted, a put in any other row HFile can cause
orphaned markers to be dropped before Phoenix CompactionScanner ever sees them.
h2. Fix
Set {{timeToPurgeDeletes = Long.MAX_VALUE}} in
{{{}setScanOptionsForFlushesAndCompactions(){}}}. This short-circuits
{{tryDropDelete()}} so HBase never purges delete markers – Phoenix
CompactionScanner then applies its standard max-lookback logic.
h2. Orphan Delete Marker Lifecycle with this -
Same as normal deleted rows:
||Time Zone||Behavior||
|Within max-lookback|Retained|
|Outside max-lookback (but within TTL)|*Purged*|
|Outside TTL|Purged|
Users who need markers retained beyond max-lookback (for replication lag) can
use the per-table max-lookback override to extend it up to TTL.
> Retain orphaned delete markers (without puts) during Phoenix compaction
> ------------------------------------------------------------------------
>
> Key: PHOENIX-7945
> URL: https://issues.apache.org/jira/browse/PHOENIX-7945
> Project: Phoenix
> Issue Type: Improvement
> Reporter: Ujjawal Kumar
> Assignee: Ujjawal Kumar
> Priority: Minor
>
> Orphaned delete markers (DeleteFamily markers without corresponding puts) are
> dropped during major compaction before Phoenix CompactionScanner can process
> them.
> *Issue -*
> HBase {{DropDeletesCompactionScanQueryMatcher.tryDropDelete()}} drops delete
> markers whose timestamp < {{earliestPutTs}} when {{KeepDeletedCells=TTL}} and
> {{timeToPurgeDeletes}} is 0. Since {{earliestPutTs}} is a *global minimum
> across ALL HFiles* being compacted, a put in any other row HFile can cause
> orphaned markers to be dropped before Phoenix CompactionScanner ever sees
> them.
> h2. Fix
> Set {{timeToPurgeDeletes = Long.MAX_VALUE}} in
> {{{}setScanOptionsForFlushesAndCompactions(){}}}. This short-circuits
> {{tryDropDelete()}} so HBase never purges delete markers – Phoenix
> CompactionScanner then applies its standard max-lookback logic.
> h2. Orphan Delete Marker Lifecycle with this -
> Same as normal deleted rows:
> ||Time Zone||Behavior||
> |Within max-lookback|Retained|
> |Outside max-lookback (but within TTL)|*Purged*|
> |Outside TTL|Purged|
> Users who need markers retained beyond max-lookback (for replication lag
> cases) can use the per-table max-lookback override to extend it up to TTL.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)